Automated design of multi-target ligands by generative deep learning

Laura Isigkeit; Tim Hörmann; Espen Schallmayer; Katharina Scholz; Felix F Lillich; Johanna H M Ehrler; Benedikt Hufnagel; Jasmin Büchner; Julian A Marschner; Jörg Pabel; Ewgenij Proschak; Daniel Merk

doi:10.1038/s41467-024-52060-8

. 2024 Sep 11;15:7946. doi: 10.1038/s41467-024-52060-8

Automated design of multi-target ligands by generative deep learning

Laura Isigkeit ¹, Tim Hörmann ², Espen Schallmayer ¹, Katharina Scholz ², Felix F Lillich ^1,³, Johanna H M Ehrler ¹, Benedikt Hufnagel ¹, Jasmin Büchner ¹, Julian A Marschner ², Jörg Pabel ², Ewgenij Proschak ^1,³, Daniel Merk ^1,^2,^✉

PMCID: PMC11390726 PMID: 39261471

Abstract

Generative deep learning models enable data-driven de novo design of molecules with tailored features. Chemical language models (CLM) trained on string representations of molecules such as SMILES have been successfully employed to design new chemical entities with experimentally confirmed activity on intended targets. Here, we probe the application of CLM to generate multi-target ligands for designed polypharmacology. We capitalize on the ability of CLM to learn from small fine-tuning sets of molecules and successfully bias the model towards designing drug-like molecules with similarity to known ligands of target pairs of interest. Designs obtained from CLM after pooled fine-tuning are predicted active on both proteins of interest and comprise pharmacophore elements of ligands for both targets in one molecule. Synthesis and testing of twelve computationally favored CLM designs for six target pairs reveals modulation of at least one intended protein by all selected designs with up to double-digit nanomolar potency and confirms seven compounds as designed dual ligands. These results corroborate CLM for multi-target de novo design as source of innovation in drug discovery.

Subject terms: Computational chemistry, Cheminformatics, Medicinal chemistry

Chemical language models are deep learning models trained with molecules in string representation. They enable data-driven de novo design of molecules with tailored features. Here, the authors used chemical language models to design multi-target ligands.

Introduction

Many prevalent and severe diseases such as the metabolic syndrome (MetS) and chronic inflammatory disorders are multifactorial and involve dysregulation of several signaling systems or metabolic pathways (Fig. 1a)^1–5. While typical treatment regimens of multifactorial diseases and multimorbidity heavily rely on drug combinations^2,6,7 (also referred to as polypharmacy), designed polypharmacology, i.e., the development of molecules intentionally and simultaneously modulating more than one target involved in a disease, can achieve synergies, improve therapeutic outcomes, and overcome multi-drug treatment^8,9. However, designing molecules that intentionally exhibit potent activity on several proteins of interest is challenging since compliance with pharmacophores for two or more target binding sites has to be achieved simultaneously^8,9. Machine learning is increasingly used to accelerate innovation in drug discovery¹⁰ and might also aid the development of designed multi-target drugs.

Chemical language models (CLM, Fig. 1b)^11–13 are deep learning models trained to capture string representations of molecules (i.e., the Simplified Molecular Input Line Entry System¹⁴, SMILES) and can be used to automatically design molecules with desired properties^15–18. CLM have been successfully employed for de novo design of new chemical entities with intended bioactivity^19–22 and enable navigation in the chemical space to obtain tailored molecular designs^11,23. The data-driven de novo design approach using CLM is “rule-free” as CLM are capable of extracting relevant features, e.g., for bioactivity from the training molecules²³. Therefore, CLM may also serve the task of multi-target design by accessing regions of the chemical space common to ligands of two proteins.

The development of a CLM typically proceeds in two steps^20,24. Initial pretraining with a large collection of molecules allows the model to capture the syntax of SMILES. The pretrained general CLM is subsequently fine-tuned by transfer learning²⁵ with a small set of molecules comprising the intended features which introduces a bias towards a region of interest in the chemical space^11,23. The fine-tuned model can then be employed to design molecules with desired properties de novo. The critical step of fine-tuning has been successfully implemented with one to tens of known active molecules²². This ability of CLM to capture relevant features for bioactivity purely from molecular templates and to work in low-data regimes suggests great potential to enable data-driven multi-target design.

In this study, we systematically evaluated and optimized the application of CLM to design dual ligands for protein pairs of interest. We focused on combinations of six targets with a validated role in MetS associated disorders: (1) angiotensin II receptor type 1 (AT₁), which exhibits a key signaling role in the cardiovascular system²⁶, (2) farnesoid X receptor (FXR), a crucial hepatoprotective transcription factor and metabolic regulator²⁷, (3) free fatty acid receptor 1 (FFAR1, GPR40), which is involved in the control of insulin and gut hormone secretion²⁸, (4) peroxisome proliferator-activated receptor δ (PPARδ), a major regulator of lipid and glucose metabolism in multiple tissues²⁹, (5) thyroid hormone receptor β (THRβ), which is under evaluation as a target in non-alcoholic fatty liver disease³⁰, and (6) soluble epoxide hydrolase (sEH), which is implicated in cardiovascular and inflammatory diseases^31,32. These targets cover a broad range of protein families, i.e., G-protein coupled receptors (AT₁, FFAR1), nuclear receptors (FXR, PPARδ, THRβ) and enzymes (sEH), and ligands of these proteins have therapeutic relevance or were clinically evaluated for diseases related to the MetS. Of note, some approved drugs selectively targeting one of these proteins are regularly used together in combination therapy³³ underlining the potential of designed dual ligands in MetS.

We trained CLM to generate dual ligands of six target pairs and synthesized top-ranking candidates for prospective validation. All twelve CLM designed compounds exhibited biological activity on at least one of the intended targets and seven dual modulators were successfully obtained for three target pairs. These results underscore the value of CLM to access desired regions of the chemical space and corroborate their application for automated de novo design of multi-target ligands.

Results

CLM capture molecular features of pooled ligand classes

To estimate the potential of pairs of the targets of interest to bind a common dual ligand, we evaluated the chemical space covered by their known modulators. Of note, while the numbers of known ligands for the targets of interest were high (Fig. 2a), dual ligands have only been annotated for FXR/sEH (34; potency cutoff <100 µM) and FXR/THRβ (1), and the pairwise similarity between known ligand sets for the targets of interest was low (Fig. 2b). Despite high chemical diversity within all known ligand sets for the targets of interest (EC₅₀/IC₅₀ ≤1 µM; mean ± SD pairwise Tanimoto similarity computed on Morgan fingerprints³⁴: 0.17 ± 0.02 (sEH) to 0.19 ± 0.03 (AT₁)), the ligands of all six targets populated defined and distinct regions of the chemical space (Fig. 2c) as illustrated by t-distributed stochastic neighbor embedding (t-SNE) with Morgan fingerprints or the fuzzier pharmacophore descriptors Chemically Advanced Template Search (CATS)³⁵. Among the target pairs, an overlap was evident for FXR agonists and sEH inhibitors as well as FXR and THRβ agonists. Additionally, PPARδ agonists and sEH inhibitors populated proximal regions of the chemical space while ligands of FFAR1, AT₁, and sEH were more distant and hence less similar. This was also evident from scaffold analysis indicating that known ligands of FXR and sEH as well as PPARδ and sEH (potency cutoff <100 µM) shared a higher number and more complex (higher no. of atoms) scaffolds than for the other target pairs (Fig. 2d). These results pointed to a substantially higher potential to obtain dual ligands for FXR/sEH, FXR/THRβ, and PPARδ/sEH than for other pairs prompting us to focus on these combinations first.

Fig. 2 — a Numbers of known ligands for the targets of interest (potency cutoff <100 µM) annotated in ChEMBL. Each plot represents a target pair of interest. b Violin plots of pairwise Tanimoto similarity distribution between ligands for the target pairs of interest. Numbers of ligands for FXR: 1412, sEH: 1920, PPARδ: 1796, THRβ: 546, AT₁: 1386, FFAR1: 1650. c Relative distribution of known ligands for the six targets of interest (FXR, sEH, THRβ, PPARδ, AT₁, FFAR1) illustrated as t-distributed stochastic neighbor embedding (t-SNE) (computed on Morgan fingerprints or Chemically Advanced Template Search (CATS) descriptors) with 10,000 random ChEMBL molecules as background. Known ligands of the target pairs FXR/sEH, FXR/THRβ and PPARδ/sEH populate proximal regions of the chemical space whereas AT₁/sEH, FFAR₁/sEH and AT₁/FFAR1 modulators are more distant (modulators with potency ≤1 µM, shaded areas). The selected fine-tuning molecules are highlighted as stars. d Numbers of common scaffolds in the known ligands (potency cutoff <100 µM) of the target pairs annotated in ChEMBL and their max. number of atoms. e Mean and max. Tanimoto similarity of molecules selected for the fine-tuning sets for the targets of interest. f Violin plots of potency distribution of known ligands for the targets of interest. Compounds selected for fine-tuning are highlighted as stars. The lines represent the 1st and 3rd quartiles and the mean of the distribution. Numbers of ligands with potency ≤1 µM for FXR: 693, sEH: 1601, PPARδ: 1019, THRβ: 370, AT₁: 1232, FFAR1: 1048. Source data are provided as a Source Data file.

We intended to achieve multi-target design with CLM using tailored template molecule sets covering ligands for a target pair for fine-tuning. Thus, we retrieved known binders of the proteins of interest from BindingDB³⁶ and clustered the ligands of each target based on their fingerprint similarity. Only the most potent compound from each cluster was kept to cover the entire known ligand space for each target and promote chemical diversity of the training molecules. After manual validation of the intended biological activity (e.g., agonism vs. antagonism) and binding mode (e.g., orthosteric vs. allosteric), 5–9 compounds per target were obtained as fine-tuning sets (Supplementary Table 1). The selected fine-tuning molecules were chemically diverse (Fig. 2e) and represented potent modulators of the targets of interest (Fig. 2f). All templates had no annotated dual activity on the studied targets.

For multi-target design model development, we built on a previously published²⁰ CLM which was pretrained on 365k molecules from ChEMBL³⁷ to capture SMILES syntax and general molecular properties. To achieve multi-target design, this general model was fine-tuned with the template molecule sets for a target pair. Three fine-tuning strategies, (1) sequential, (2) alternating and (3) pooled fine-tuning, were evaluated using beam search²⁰ (width 50) to identify the best performing approach (Fig. 3a and Supplementary Fig. 1a). Beam search sampling is a heuristic method to reveal molecules with high probability to be sampled from a CLM and proved valuable to monitor fine-tuning effects^20,22. Evaluation of beam search designs sampled during the transfer learning procedure for similarity to the fine-tuning molecules demonstrated substantial differences in the fine-tuning strategies.

Fig. 3 — The target pair PPARδ/sEH is shown as example in (a–e). a Effects of different CLM fine-tuning strategies on the similarity of beam search designs (width 50) to the fine-tuning molecules. Fine-tuning with pooled template sets was superior to sequential and alternating fine-tuning strategies in terms of design similarity to both fine-tuning collections. Graphs show the max. Tanimoto similarity ± SD (standard deviation of max. Tanimoto similarity) computed on Morgan fingerprints of the beam search designs per epoch to the fine-tuning molecules. For each epoch beam search designs (width 50) were generated and only valid SMILES were analyzed. b Quantitative estimation of drug-likeness (QED) scores³⁸ and basic molecular features of beam search designs over the fine-tuning procedure (epochs 0, 15, 30, 45, 60) illustrated as violin plots. Stars represent the fine-tuning molecules and beam search designs, respectively. Numbers of analyzed ligands/designs PPARδ: 5, sEH: 6, epochs 0: 15, epoch 15: 23, epoch 30: 17, epoch 45: 10, epoch 60: 13 (only valid SMILES from beam search were analyzed). c Effects of pooled fine-tuning on design features (visualized as t-distributed stochastic neighbor embedding (t-SNE)). Top beam search designs from the epochs of highest similarity to both fine-tuning sets (51–55) in comparison to beam search designs from epoch 0 are shown. Numbers of analyzed ligands/designs PPARδ: 5, sEH: 6, epoch 0: 15, epochs 51–55: 249. d Target prediction (Z-scores) of designs from epochs 51–55 using the Similarity Ensemble Approach (SEA)³⁹ for the targets of interest. Z-Scores are the mean ± standard deviation (SD). e Synthetic accessibility scores⁴⁰ for the beam search designs obtained from different epochs during pooled fine-tuning visualized as violin plots. Fine-tuning molecules and beam search designs are highlighted as stars, respectively, and the middle line represents the mean of the distribution. Numbers of analyzed ligands/designs PPARδ: 5, sEH: 6, epochs 0: 15, epoch 15: 23, epoch 30: 17, epoch 45: 10, epoch 60: 13 (only valid SMILES from beam search were analyzed). Source data are provided as a Source Data file.

Sequential fine-tuning successfully biased the model towards the first target of each pair within few epochs. However, similarity to known ligands of the first target was also rapidly lost when template molecules for the second target were introduced. Alternating fine-tuning had a similar effect with higher frequency and flipped the bias to either of the targets from epoch to epoch. The intended balanced similarity to both targets of a pair was only achieved by pooling both fine-tuning sets.

De novo designs obtained from the CLM after pooled fine-tuning exhibited favorable validity, novelty and uniqueness (Supplementary Tables 2 and 3) and had quantitative estimation of drug-likeness (QED) scores³⁸ (Fig. 3b and Supplementary Fig. 1b) in the range of the template molecules suggesting that the multi-target design approach by CLM generated novel drug-like molecules. With an increasing number of fine-tuning epochs, the designs approached the pooled template sets in terms of basic features such as molecular weight (MW), clogP and topological polar surface area (TPSA; Fig. 3b and Supplementary Fig. 1b). A bias towards the chemical space around and between the template molecules after fine-tuning was also evident by visualization with t-SNE (Fig. 3c) which revealed that beam search designs from the selected epochs populated a region between known ligands for the targets of interest. Moreover, external target prediction using the Similarity Ensemble Approach (SEA)³⁹ demonstrated substantially enhanced probability for interaction with the targets of interest after fine-tuning compared to baseline designs from epoch 0 (Fig. 3d). Synthetic accessibility⁴⁰ of the designs was not compromised by pooled fine-tuning and remained in the favorable range of the template molecules (Fig. 3e).

CLM designed dual ligands modulate both intended targets

With these encouraging results, we set out to test the potential of CLM based multi-target design prospectively. From the dual-target CLMs obtained by pooled fine-tuning on ligand sets for FXR/sEH, FXR/THRβ and PPARδ/sEH, we identified epochs complying best with the design criteria, i.e., balanced similarity to the fine-tuning sets for both targets (epochs 51–55 for PPARδ/sEH; cf. Fig. 3c), and sampled 5000 designs by temperature sampling²³. From each virtual collection, we selected twelve designs based on sampling frequency²² as model-intrinsic approach for automated design prioritization without external scoring, and twelve designs based on fingerprint similarity to the fine-tuning molecules for further evaluation (Supplementary Table 4). Three compounds per target pair were prioritized from these collections by docking for prospective validation by synthesis and in vitro characterization. Two compounds (1, 2) for the FXR/sEH target pair resembled previously developed dual FXR/sEH modulators⁴¹ thus strongly pointing to dual activity. Of note, these previously known dual FXR/sEH ligands were not used to develop the model. As an additional proof of concept, we applied the design approach to the FXR/PPARδ target pair (Supplementary Table 5) for which dual agonists have been previously obtained by rational design⁴². A CLM fine-tuned with the pooled selective FXR and PPARδ ligand template sets (not containing dual ligands) successfully generated known dual FXR/PPARδ agonists and several structurally similar molecules supporting the applied design strategy.

The CLM designed dual ligand candidates 3–9 were successfully synthesized over 2–6 steps with 1–68% overall yield (Fig. 4) demonstrating that the dual target CLM generated synthesizable molecules. Compound 7 was originally designed with tert-butyloxy motif which was replaced by tert-butyl for improved chemical stability. The dual ligand candidates were characterized in functional assays for the targets of interest using cellular Gal4-hybrid reporter gene assays⁴³ to determine FXR, THRβ and PPARδ modulation and an enzyme activity assay⁴⁴ with the fluorogenic substrate PHOME to test sEH inhibition (Table 1). In vitro characterization of the CLM designed dual FXR/sEH modulators 1–3 confirmed engagement of both targets by all three compounds with sub-micromolar to low micromolar potencies. The dual FXR/THRβ agonist designs obtained from the model exhibited substantial FXR agonism combined with weak (6) or no detectable (4, 5) THRβ agonism at non-toxic concentrations. All designed PPARδ/sEH modulators (7–9) were confirmed as dual ligands of the intended targets. 7 emerged as remarkably potent sEH inhibitor endowed with micromolar PPARδ agonist activity. 8 and 9 were less potent on both targets.

Table. 1.

Activity of dual ligand designs 1–9 on the targets of interest

ID	Structure	In vitro biol. activity
1		FXR: EC₅₀ = 0.37 µM sEH: IC₅₀ = 4.1 µM
2		FXR: EC₅₀ = 0.44 µM sEH: IC₅₀ = 8.1 µM
3		FXR: 3.4-fold act. at 50 µM sEH: IC₅₀ = 21 µM
4		FXR: EC₅₀ = 12 µM THRβ: inactive
5		FXR: EC₅₀ = 11 µM THRβ: inactive
6		FXR: EC₅₀ = 1.0 µM THRβ: 5-fold act. at 15 µM
7		PPARδ: EC₅₀ = 10 µM sEH: IC₅₀ = 0.017 µM
8		PPARδ: 6.7-fold act. at 80 µM sEH: IC₅₀ = 0.20 µM
9		PPARδ: 3.0-fold act. at 80 µM sEH: IC₅₀ = 0.27 µM

Open in a new tab

At least one dual ligand was identified for all three target pairs. Activities were determined in Gal4-hybrid reporter gene assays (FXR, THRβ, PPARδ) and an enzyme activity assay (sEH). Activity data are the mean; n ≥ 3.

These results from in vitro testing corroborated multi-target design by CLM. Seven out of nine dual ligands designed by the models modulated both intended targets. With potencies varying between 17 nM and >100 µM, and mostly unbalanced activity on the respective target pair, the CLM designs require structural optimization, but their scaffolds are highly valuable leads for designed polypharmacology.

CLM designs fuse molecular and pharmacophore features of ligands for two targets

The predicted binding modes of the dual modulators 1, 6 and 7 to the targets of interest (Fig. 5a–c) revealed key interactions that were also formed by the co-crystallized ligands. The designs thus complied with the pharmacophore requirements of the respective binding sites indicating that the CLM could capture relevant pharmacophore elements for the respective two binding sites and fuse them in one molecule. This was also evident from structural comparison of the CLM designed dual ligands with the most similar ligands annotated in ChEMBL for the respective targets (Fig. 5d). The dual FXR/THRβ modulator 6 combined the well-known 5-alkyl-3-phenylisoxazole motif of FXR agonists with a (4-benzylphenoxy)acetic acid residue inherited from THRβ agonists. The latter group at the same time resembled aromatic linkers found in FXR agonists bearing the essential carboxylic acid motif. Similarly, the dual PPARδ/sEH ligand 9 comprised a hydrophobic urea scaffold widely found in sEH inhibitors which was merged with a 3-(4-alkyloxyphenyl)propanoic acid pharmacophore of PPARδ agonists. Dual ligands designed by the CLM thus comprised merged pharmacophores^8,9, were drug-like with similar QED scores as the template molecules and exhibited preferable synthetic accessibility (Fig. 5e and Supplementary Fig. 2). Due to the structural fusion of ligand features for the proteins of interest, the designed dual ligands exhibited similarity to known modulators of both their targets (Supplementary Table 6) and resembled the respective fine-tuning sets in basic properties like MW, clogP and TPSA (Fig. 5f and Supplementary Fig. 2).

Comparison of the designed dual ligands with the fine-tuning molecules (Fig. 6a and Supplementary Fig. 3) indicated that the CLM fused pharmacophore elements with varying degrees of structural diversity. The designs exhibited low to intermediate Tanimoto similarity to their respective most similar fine-tuning templates and Rapid Similarity Calculation of Maximum Common Edge Subgraph (RascalMCES)⁴⁵, which identify maximum common sub-skeletons of pairs of molecules, revealed substructure contributions from both selective templates to the dual modulators. To verify that the polypharmacology of the dual ligands was indeed due to the pharmacophore fusion and structural modification designed by the CLM, we tested the commercially available fine-tuning compounds for PPARδ/sEH as representative target pair in the same assays as the designed dual ligands (Table 2). The evaluated fine-tuning molecules were selective for their annotated bioactivity over the second target of the respective designs providing further support for the ability of CLM to fuse selective pharmacophores to dual ligands. Although a potential bias from other molecules used to (pre-)train the CLMs, which act as dual ligands but have not been tested on the targets of interest, cannot be fully excluded, these results indicate that structural fusion and moderate structural modification by the CLM compared to the selective templates was driving the designed polypharmacology.

Table. 2.

Activity of PPARδ/sEH fine-tuning molecules on the targets of interest

EC₅₀(PPARδ) [µM]	IC₅₀(sEH) [µM]	Tanimoto
0.016	>30	0.17; 0.13; 0.13
0.022	>30	0.20; 0.15; 0.16
0.009	>30	0.13; 0.11; 0.11
>50	0.005	0.55; 0.25; 0.27
>50	0.097	0.38; 0.19; 0.19
>50	<0.001	0.11; 0.13; 0.13

Open in a new tab

Commercially available fine-tuning molecules for PPARδ/sEH were experimentally confirmed selective for their annotated bioactivity over the respective second target in the assays used to characterize the dual PPARδ/sEH ligand designs 7–9. Data are the mean; n = 3. Tanimoto refers to the fine-tuning molecules’ similarity to the experimentally tested dual ligand designs 7–9.

To validate our design approach further, we evaluated whether the CLM learned target-specific SAR features from the pooled dual fine-tuning by comparing with designs from a native CLM without fine-tuning and with designs obtained after fine-tuning with the pooled templates for all four targets (Fig. 6b–d). In contrast to pooled dual fine-tuning, the Tanimoto similarity to the template sets did not consistently increase when the CLM was fine-tuned with the pooled templates for FXR, THRβ, PPARδ and sEH (Fig. 6b) suggesting potentially that simultaneous training with molecular information for four targets was too diverse to extract the relevant structural features or too complex to be structurally fused. Similarly, target prediction for the designs obtained after pooled fine-tuning for four targets revealed increasing scores but no consistent balanced improvement and strong variance over the fine-tuning procedure (Fig. 6c).

We then used pharmacophore models for the targets of interest (Fig. 6d and Supplementary Fig. 4) to estimate the rates of designed dual ligands obtained from a native baseline CLM, after pooled fine-tuning for target pairs (dual), and after fine-tuning with the pooled templates for all four targets. The fraction of designs from the baseline CLM matching both pharmacophore models of a target pair of interest was small (Fig. 6d) aligning with very low scores in the target prediction for the native model (Fig. 6c, epoch 0—baseline model). Pooled fine-tuning with the templates for all four targets slightly enhanced the fraction of designs matching pharmacophore models for the target pairs but was inferior to the focused dual models. These results further corroborated the pooled fine-tuning for target pairs to achieve dual ligand design by CLM.

Dual modulator design by CLM fails for target pairs with highly diverse ligands

Intrigued by the favorable results for CLM-driven dual ligand design we next addressed the target pairs whose known ligands populated more distant regions in the chemical space (Fig. 2c). Using the same procedures that successfully generated dual FXR/sEH, FXR/THRβ, and PPARδ/sEH modulators, we designed dual ligand candidates for AT₁/sEH, FFAR1/sEH, and AT₁/FFAR1 and selected one design per target pair for synthesis based on sampling frequency and subsequent docking. Again, the computationally favored designs 10–12 were synthesizable with good overall yields (14–40% over 4 steps; Fig. 7a). For improved accessibility, the dual FFAR1/sEH modulator design 11 was slightly modified by replacing a cyclopropyl substituent suggested by the CLM with an ethyl group. In vitro testing in cellular assays for AT₁ and FFAR1, and an enzyme activity assay for sEH revealed biological activity for all three designs (Table 3) but no design exhibited the intended dual modulation. The dual AT₁/sEH (10) and FFAR1/sEH (11) modulator candidates potently inhibited sEH but were inactive on the intended GPCRs. The dual AT₁/FFAR1 design 12 activated FFAR1 but showed no effect on AT₁ activity. Despite only testing one design per target pair, these results may suggest that the CLM was able to capture features relevant for modulation of the targets of interest but failed to access a chemical space region of dual ligands. However, based on the different distribution of known modulators for these targets in the chemical space (cf. Fig. 2c) it can be speculated whether such common region exists. Of note, ChEMBL contains no molecule with an annotated dual activity at <100 µM on AT₁/sEH, FFAR1/sEH or AT₁/FFAR1 (cf. Fig. 2a). The failure to access a chemical space of dual ligands for the targets of interest could also be observed by target prediction (SEA, Fig. 7b) which suggested improvement for only one target (FFAR1/sEH) or varying performance (AT₁/FFAR1). Only the dual AT₁/sEH focused designs tended to have a higher probability for activity on both targets at later fine-tuning epochs.

Fig. 7 — a Synthesis of 10–12. b Target prediction (Z-scores) of designs obtained by beam search over the fine-tuning procedure using the Similarity Ensemble Approach (SEA)³⁹ for the target of interest. Z-Scores are the mean ± standard deviation (SD). For each epoch beam search designs (width 50) were generated and only valid SMILES were analyzed. Source data are provided as a Source Data file.

Table 3.

Activity of dual ligand designs 10–12 on the targets of interest

Structure

In vitro biol. activity

graphic file with name 41467_2024_52060_Tabp_HTML.gif

AT₁: inactive

sEH: IC₅₀ = 0.02 µM

graphic file with name 41467_2024_52060_Tabq_HTML.gif

FFAR1: inactive

sEH: IC₅₀ = 0.061 µM

graphic file with name 41467_2024_52060_Tabr_HTML.gif

FFAR1: EC₅₀ = 8.5 µM

AT₁: inactive

Open in a new tab

The dual AT₁/sEH (10), FFAR1/sEH (11) and FFAR1/AT₁ (12) ligand designs modulated only one of the targets of interest. Data are the mean; n ≥ 3.

Discussion

Generative deep learning is on the rise as a tool to design innovative new chemical entities (NCEs) with tailored properties such as bioactivity on an intended target. Such de novo design by deep learning is data-driven and overcomes the need for predefined rules, thus offering potential access to a wider chemical space^11,23. Among various graph-based and sequence-based approaches^{19–22,46–49}, CLM have been particularly successful in designing NCEs with experimentally confirmed activity which was achieved by fine-tuning with known ligands for the targets of interest as templates. Based on this valuable performance of CLM in transfer learning even in low-data scenarios^22,23, we here sought to extend the scope of CLM application to multi-target design, i.e., de novo design of NCEs with biological activity on two intended targets. Designed multiple ligands of proteins that are co-involved in one pathology can exhibit enhanced efficacy by exploiting synergistic effects and may be valuable for drug development. For example, in multifactorial pathologies like the MetS or chronic inflammatory diseases, designed polypharmacology could be an avenue to improve therapeutic outcomes and overcome polypharmacy. However, the development of multi-target ligands is challenging and typically requires sophisticated design and extensive optimization. CLM could accelerate solving this task as an innovative data-driven approach to designed polypharmacology.

CLM learn from data and can capture higher-order chemical features from the molecular string representation of SMILES to achieve rule-free de novo design. Fine-tuning with a tailored set of molecules comprising the desired features is the critical step in CLM development and allows biasing the model towards a chemical space region of interest. We reasoned that data-driven de novo design by CLM could offer access to multi-target ligands by employing sets of known modulators for target pairs of interest for fine-tuning. Pooled sets of template ligands indeed successfully biased the model to design molecules exhibiting similarity to both template sets. Sequential fine-tuning, in contrast, produced a bias towards one target per pair and the alternating approach seemed to rather disturb the model. Designs obtained from CLM after pooled fine-tuning resembled the combined fine-tuning sets in terms of basic molecular features and had higher predicted probabilities for interaction with the targets of interest than designs from the baseline model.

We selected multi-target de novo designs, that were computationally favored based on similarity and on sampling frequency as a model intrinsic measure, for experimental validation. Synthesis and in vitro evaluation confirmed all twelve tested designs as active on at least one of the targets of interest and seven out of twelve modulated both intended targets with varying potencies. This high success rate in obtaining dual ligands may be due to a focus on target pairs with well compatible binding sites and is likely not representative of a broader application of CLM for multi-target design. Still, this outcome demonstrates potential of CLM in designing dual modulators as innovative bioactive NCEs.

Experimentally confirmed dual ligands were successfully obtained for FXR/sEH, FXR/THRβ, and PPARδ/sEH while the computationally most favored designs obtained from the CLM for AT₁/sEH, FFAR1/sEH, and AT₁/FFAR1 did not act as dual modulators but were selective for one of the targets of interest. Known ligands of target pairs for which dual ligands were successfully obtained populate proximal regions in the chemical space indicating higher chemical similarity compared to ligands of AT₁/FFAR1 and sEH. Despite only testing one design per target pair for the less similar combinations, these observations indicate that de novo design of multi-target ligands by CLM is feasible in scenarios where the template molecules representing the targets of interest are close in the chemical space and more challenging for target combinations whose ligands are chemically more diverse—a scenario that also hinders multiple ligand development by traditional, systematic means⁸. While target pairs with too diverse ligand binding sites will likely not be accessible for designed polypharmacology due to their inability to bind a common ligand, dual modulation of target combinations with intermediate ligand similarity might be achievable with further improved CLM, combined approaches like interactome learning⁵⁰, or newer models like GPT.

Comparison with the baseline model and a CLM fine-tuned with all template molecules confirmed that the focused dual ligand CLMs captured target-specific ligand features. Designs with confirmed dual activity obtained from these dual ligand CLM comprised structural contributions from template ligands for both targets highlighting an ability of CLM to merge pharmacophores. In contrast to linking or fusing the chemical features relevant for activity, dual ligands with merged pharmacophores offer favorable drug-like properties and are most difficult to obtain in designed polypharmacology^8,9. These favorable results therefore corroborate CLM for multi-target de novo design.

Methods

Computational methods

Software

Python (v3.7.13 for CLM and v3.9.16 for visualizations), RDKit (v2022.09), DataWarrior (v5.5.0), tensorflow (v2.14.0), keras (v.2.14.0) scikit-learn (v1.2.2), matplotlib (v3.8.4), pandas (v.2.2.2) and numpy (v.1.26.4) in Python (v3.7.13). Molecular Operating Environment⁵¹ (MOE, version 2022.02, Chemical Computing Group Inc. Montreal, QC, Canada) for molecular docking and pharmacophore generation and search.

Data processing

Molecules were encoded as canonical SMILES using RDKit (v2022.09, www.rdkit.org). Only SMILES up to 140 characters in length were retained and standardized in Python (v3.9.16, www.python.org) by removing stereochemistry, salts, and duplicates.

Fine-tuning sets

Molecules with IC₅₀/EC₅₀ ≤ 1 µM on the targets of interest were selected from BindingDB³⁶ (v2021). After processing (cf. Data processing), the molecules were clustered based on Morgan fingerprints³⁴ (length = 1024, radius = 2) using the k-means algorithm from scikit-learn (v1.2.2, www.scikit-learn.org) in Python (v3.9.16). The cluster number was optimized from 2 to the number of molecules to minimize the silhouette score. From the resulting clusters, the molecule with the lowest EC₅₀/IC₅₀-value was selected as representative. From this collection, the final fine-tuning molecules for each target were selected based on manual binding mode inspection and confirmation of the desired biological activity (agonism, inhibition, etc.).

CLM implementation

We used a recently published framework (www.github.com/ETHmodlab/ molecular_design_with_beam_search)²⁰ to implement the multi-target CLM in Python (v3.7.13) using tensorflow (v2.14.0) and keras API (www.tensorflow.org, v.2.14.0). The model was based on a recurrent neural network with long short-term memory (LSTM) cells and consisted of four layers with a total of 5,820,515 parameters: layer 1, BatchNormalization; layer 2, LSTM with 1024 units; layer 3, LSTM with 256 units; layer 4, BatchNormalization. The CLM was trained using the Adam optimizer (learning rate = 10⁻³), categorical cross-entropy loss, and a dropout rate of 0.4 for both LSTM layers, with the first layer set to frozen. The CLM was trained on SMILES strings encoded as one-hot vectors and all SMILES were augmented 10-fold.

Training strategies

In each strategy, the CLM was trained over 60 epochs. Pooled: both fine-tuning sets were used together in a single set to fine-tune the CLM. Alternating: the two fine-tuning sets were alternatingly presented to the CLM in each epoch. Sequential: the CLM was first trained with one fine-tuning set for 60 epochs. Then, based on the maximum similarity of the beam designs to the fine-tuning set, one epoch was selected to continue training with the other fine-tuning set.

Beam search and epoch selection

We used beam search²⁰ with a beam width of 50 to monitor fine-tuning and select epochs for sampling. The maximum SMILES string length was defined as 140 tokens.

Temperature sampling

SMILES were sampled using the softmax function parameterized by two different sampling temperatures (0.2 and 0.7). The probability of the ith character to be sampled from the CLM was computed as: $q_{i} = \exp (\frac{z_{i}}{T}) / \sum_{j} \exp (z_{j} / T)$ where $z_{i}$ is the CLM prediction for character i, T is the temperature, and q_i is the sampling probability of character i.

Top 12 selection

For both sampling temperatures (0.2, 0.7) we selected 12 CLM designs with highest sampling frequency and 12 CLM designs with highest geometric mean similarity to their fine-tuning sets. The geometric mean between frequency and similarity led to the selection of the top 12 candidate designs.

Molecular docking

From the top 12 candidate designs, molecules were selected for synthesis and in vitro characterization by molecular docking. Docking was performed in Molecular Operating Environment⁵¹ (MOE, version 2022.02, Chemical Computing Group Inc. Montreal, QC, Canada). The ligand-bound X-ray structures of sEH (pdb ID: 5ali⁵²), FXR (pdb ID: 6a60⁵³), THRβ (pdb ID: 1nax⁵⁴), PPARδ (pdb ID: 5y7x), AT₁ (pdb ID: 4zud⁵⁵) and FFAR1 (pdb ID: 4phu⁵⁶) served as templates. The structures were prepared using the MOE QuickPrep tool with default settings, adjusting the protonation state of the complex. Ligands were prepared using the MOE Wash tool with dominant protonation state at pH 7.0; coordinates were rebuilt 3D; existing chirality was maintained. The following settings were used for all docking calculations: Force Field = Amber10:EHT, Receptor = Receptor and Solvent Atoms, Site = Crystallized Ligand Atoms, Placement = Template with 100 poses, Refinement = Rigid Receptor, scoring function = GBVI/WSA dG with 10 poses.

Molecular descriptors

Morgan fingerprints³⁴ (length = 1024, radius = 2), RascalMCES⁴⁵ (similarityThreshold = 0.5) and quantitative estimation of drug-likeness³⁸ (default settings) were computed using RDKit (v2022.09) in Python (v3.9.16). Total molweight, cLogP and TPSA were computed using DataWarrior⁵⁷ (v5.5.0). Synthetic accessibility score was computed using SwissADME⁴⁰. CATS³⁵ descriptors were computed with https://github.com/iwatobipen/CATS2D.

Target prediction

The web interface of the Similarity ensemble approach (SEA)³⁹ (https://sea.bkslab.org/) was used for target prediction.

Stochastic neighbor embedding

The t-SNE projection was performed with scikit-learn (v1.2.2) in Python (v3.9.16) on Morgan fingerprints and on CATS³⁵ descriptors. The following settings were chosen: perplexity = 30, learning_rate = “auto”, init = “pca”.

Pharmacophore modeling and search

For pharmacophore modeling, the ligands of the superimposed X-ray structures of sEH (pdb IDs: 5ali⁵², 3ant⁵⁸, 3wke⁵⁹), FXR (pdb IDs: 6a60⁵³, 3dcu⁶⁰, 3fli⁶¹), THRβ (pdb IDs: 1nax⁵⁴, 1r6g⁶², 6kkb⁶³) and PPARδ (pdb IDs: 5y7x, 3tkm⁶⁴, 5u46⁶⁵) were used in Molecular Operating Environment⁵¹ (MOE, version 2022.02, Chemical Computing Group Inc. Montreal, QC, Canada). A consensus pharmacophore model was generated per target considering features present in at least 2 ligands. Binding site residues of the target were defined as excluded volumes. The sampled designs of the respective target pairs were screened with the respective pharmacophore models for the targets of interest.

In vitro assays

Cell culture for stably transfected CHO-K1 cell lines

CHO-K1 cells (DSMZ, ACC110) were cultured at 5% CO₂ and 37 °C in full growth medium (Ham’s F-12 medium supplemented with 10% fetal calf serum (FCS), penicillin (100 U/mL) and streptomycin (100 µg/mL)). For harvesting cells were incubated with Trypsin/EDTA. IP-One assay for AT₁ and FFAR1. To investigate the ligand dependent activation of the two GPCRs, inositol mono-phosphate (IP-One) accumulation was monitored using an HTRF based displacement assay between FRET acceptor coupled IP-One and Terbium cryptate coupled anti-IP-One antibody (IP-One assay kit, Cisbio Bioassays, Codolet, France). AT₁: the IP-One assay for AT₁ was performed according to the protocol published by Hernandez-Olmos et al.⁶⁶ In brief, by using the sleeping beauty method a CHO-K1 cell line was generated which stably overexpresses AT₁ isoform 1 and GNA11 (G-protein subunit α 11) as well as cells only overexpressing the GNA11 without any GPCR, as control. Cells were seeded into white tissue culture 384-well plates (Greiner Bio-One, Frickenhausen, Germany) at 15,000 cells/well and incubated overnight at 37 °C and 5% CO₂. The next day the medium was removed and the cells were washed four times with stimulation buffer (146 mM NaCl, 4.2 mM KCl, 1 mM CaCl₂, 0.5 mM MgCl₂, 50 mM LiCl, 5.5 mM D-glucose, 0.1% (w/v) fatty acid-free bovine serum albumin fraction V buffered with 10 mM HEPES at pH 7.4 (NaOH)) using a Tecan HydroSpeed plate washer (Tecan Deutschland GmbH, Crailsheim, Germany). Thereafter, the compounds and a total of 0.5% DMSO were added to the cells. The plate was sealed and incubated for 90 min at 37 °C. Afterwards the cells were lysed by addition of the detection agents prepared in lysis buffer according to the manufacturer′s instructions. To calculate the IP-One concentration produced by the cells, a standard curve using dilutions of unlabeled IP-One in buffer without cells was used. For control of assay performance, a full dose-response curve of the known agonist [Val5]-Angiotensin II (Sigma-Aldrich #A2900) was conducted. Antagonist activity of compounds was measured with parallel stimulation with 10 nM [Val5]-Angiotensin II. FFAR1: the cell line generation for FFAR1 was conducted according to the protocol published by Ehrler et al.⁶⁷ and the second messenger assay was performed as described for AT₁. A full dose-response curve with FFAR1 reference agonist GW9508 was measured as control for assay performance. All compounds were tested at the indicated concentration and CHO-K1 wt cells were used as control.

sEH activity assay

Soluble epoxide hydrolase (sEH) was purified as described by Hahn et al.⁶⁸ and Lukin et al.⁶⁹, respectively. Protein expression was carried out in 500 ml ZYP5052 autoinduction medium containing Kanamycin (100 mg/L) with lactose uptake-dependent induction. The fermentation procedure started at 37 °C and 200 rpm until the OD600 reached approximately 1.0. Then the temperature was reduced to 16 °C. After 24–36 h the cells (OD600 ≈ 15.0) were harvested by centrifugation at 5000 × g (4 °C, 20 min). Pellets were resuspended in a resuspension buffer containing Tris (50 mM), NaCl (0.5 M) and MgCl₂ (10 mM) at pH 8 and sonicated twice for 15 s on ice. The sonicated suspension was centrifuged (18,000 × g, 30 min, 4 °C) and the supernatant was ultracentrifuged (100,000 × g, 1 h 10 min, 4 °C). The ultracentrifugation supernatant was loaded on a 5 ml HisTrap® FF (GE Healthcare, München, Germany) column at a flow rate of 2 ml/min and washed with washing buffer containing Tris (50 mM), NaCl (0.5 M), MgCl₂ (10 mM), imidazole (20 mM) and DTT (3 mM) at pH 8, and eluted at 120 mM imidazole. The fractions containing hsEH after SDS-PAGE fraction analysis were pooled. The identity of hsEH was confirmed by Western Blotting and hsEH hydrolase activity assay. The inhibitory potency of the compounds was determined using a fluorescence-based enzyme activity assay in which PHOME (3-phenyl-cyano(6-methoxy-2-naphthalenyl)methylester-2-oxiraneacetic acid) is metabolized by sEH hydrolase towards a fluorescent naphthalene aldehyde⁷⁰. Dilution series of the compounds were incubated for 30 min with recombinant sEH (final protein concentration 3 nM). Subsequently, the fluorogenic substrate PHOME at final concentration of 50 μM was added, and the fluorescence was measured every minute for 45 min using a Tecan plate reader (ex: 360 nm, em: 465 nm; bandwidth 35 nm). Inhibition in percent was plotted against the logarithmic test compound concentration and IC₅₀-values were calculated using the “log(Inhibitor) vs. Response – Variable slope (four parameters)” equation in GraphPad Prism 7.

Hybrid Gal4 reporter gene assays

HEK293T cells (ATCC, CRL-1573) were cultured in Dulbecco’s modified eagle medium (DMEM), high glucose, supplemented with 10% FCS, sodium pyruvate (1 mM), penicillin (100 U/mL), and streptomycin (100 µg/mL) to 70–80% confluence at 37 °C and 5% CO₂ and seeded in clear 96-well plates with a density of 40,000 cells/well. After 24 h, the medium was changed to Opti-MEM without supplements. Transient transfection was performed using Lipofectamine LTX reagent (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s protocol with pFR-Luc (100 ng/well) as reporter plasmid, pRL-SV40 (2 ng/well) for normalization of transfection efficiency and cell growth, and one pFA-CMV-hNR-LBD clone coding for the hinge region and ligand binding domain (LBD) of the NR of interest. Five hours after transfection, the cells were incubated with Opti-MEM supplemented with penicillin (100 U/mL) and streptomycin (100 µg/mL), additionally containing 0.1% DMSO and the respective test compound or 0.1% DMSO alone as untreated control. After overnight (14–16 h) incubation, luminescence was measured using the Dual-Glo® Luciferase Assay System (Promega, Madison, WI, USA) according to the manufacturer’s protocol with a Tecan Spark® 10 M multimode microplate reader. Relative light units (RLU) for normalization of transfection efficiency and cell growth were calculated by dividing firefly luminescence by renilla luminescence and multiplication by 1000. Fold activation was obtained by dividing the mean RLU of a test compound at a respective concentration by the mean RLU of the untreated control. All samples were tested in duplicates in at least three biologically independent experiments. For dose response curve fitting and calculation of the EC₅₀/IC₅₀ values, the equation “[Agonist]/[Inhibitor] vs. response – Variable slope (four parameters)” was used in GraphPad Prism 7. The following pFA-CMV-hNR-LBD clones and reference ligands were used: THRβ (pFA-CMV-hTHRβ-LBD⁷¹, 1 µM T3), PPARδ (pFA-CMV-hPPARδ-LBD⁷², 1 µM L-165,041), FXR (pFA-CMV-hFXR-LBD⁷³, 1 µM GW4064).

Chemistry

General

Unless stated otherwise, reactions were performed in dry, pre-heated Schlenk glassware under argon atmosphere. Chemicals purchased from BLD and Merck were of reagent grade and used without further purification. Dry solvents were purchased from Sigma-Aldrich and stored under argon atmosphere. Other solvents—especially for work-up procedures—were of reagent grade or purified by distillation (iso-hexanes, EtOAc). Reactions were monitored by thin layer chromatography (TLC) on TLC Silica gel 60 F₂₅₄ aluminum sheets by Merck and visualized under fluorescent light (254 or 366 nm) or by ceric ammonium molybdate staining (5% (NH₄)₆Mo₇O₂₄ × 4 H₂O, 0.2% Ce(SO₄)₂ in 5%iger H₂SO₄). Reaction and intermediate product control was performed by mass spectrometry on an Advion Interchim MS system with APCI using either ASAP for solutions or an Advion Plate express for TLC. Manual normal phase flash column chromatography was performed under positive N₂ pressure using silica 60 M from Macherey-Nagel. Automated normal phase flash column chromatography (CC) was performed on an Interchim Puriflash XS520Plus using corresponding PF-15/30/50SIHP columns. NMR spectroscopy was performed on a Bruker BioSpin AV400 or AV500. Chemical shifts (δ) are reported in ppm relative to tetramethylsilane (TMS) or the residual solvent signal protons. Signal multiplicity is reported as singlet (s), broad singlet (bs), doublet (d), triplet (t), quartet (q), quintet (quin), heptet (hept), multiplet (m), doublet of doublets (dd), triplet of triplets (tt), doublet of triplets (dt), triplet of doublets (td). Coupling constants (J) are given in Hz. Assignments were made by means of two-dimensional experiments (¹H-¹H-COSY, ¹H-¹³C-HMQC, ¹H-¹³C-HMBC). Quantitative ¹H NMR (qHNMR) were acquired according to a method described by Pauli et al.⁷⁴ with internal calibration. The qHNMR measurements were carried out under conditions allowing complete relaxation to assure the exact determination of peak area ratios. High-resolution mass (HRMS) spectra were recorded on an MALDI LTQ ORBITRAP XL instrument (Thermo Fisher Scientific Inc., Waltham, USA). using a α-cyano-4-hydroxycinnamic acid (HCCA) matrix or on a Thermo Finnigan LTQ FT Ultra FT-ICR (ESI). LCMS analysis was performed on a LCMS 2020 from Shimadzu (Duisburg, Germany). For analytical determination, a Luna 10u C18(2) (250 × 4.6 nm) and for semi-preparative purification, a Luna 10μ C18(2) (250 × 21.20 nm) column from Phenomenex LTD Deutschland (Aschaffenburg, Germany) was used. The system is equipped with a SPD 20 A UV/VIS detector (λ = 240/280 nm) and an ESI-TOF (measuring in the positive- and/or negative-ion mode). Mixtures of MeCN/0.1% aqueous formic acid were used as mobile phase with a flow rate of 0.1 mL/min (Scout Column) or 21 mL/min (semi-preparative) at rt. The following methods were used: Method 1: linear gradient from 50 to 90% MeCN over 10 min, 90% MeCN for 5 min, linear gradient from 90 to 50% MeCN over 1 min, 50% for 2 min; Method 2: 5% MeCN over 2 min, linear gradient from 5% to 90% MeCN over 12 min, 90% MeCN for 6 min, linear gradient from 90% to 5% MeCN over 1 min, 50% for 4 min; Purity of the compounds was determined by integrating the peaks of the UV-chromatogram. All compounds for biological testing had >95% purity according to qHNMR or LCMS analysis.

Synthetic procedures and analytical characterization of the test compounds are described in the Supporting Information.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Supplementary information

Supplementary Information^{(3.7MB, pdf)}

Peer Review File^{(456.3KB, pdf)}

Reporting Summary^{(978KB, pdf)}

Source data

Source Data^{(1.1MB, xlsx)}

Acknowledgements

The authors thank Lilia Weizel for technical support and Dr. Kerstin Hiesinger for sharing HK91. This research was funded by the European Union (ERC, NeuRoPROBE, 101040355). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them. This work has received funding from the Innovative Medicines Initiative 2 Joint Undertaking (JU) under grant agreement No. 875510. The JU receives support from the European Union’s Horizon 2020 research and innovation program, EFPIA, Ontario Institute for Cancer Research, Royal Institution for the Advancement of Learning McGill University, Kungliga Tekniska Hoegskolan, and Diamond Light Source Limited. J.H.M.E. and E.P. thank the Deutsche Forschungsgemeinschaft (DFG, SFB1039, TP A07) for financial support.

Author contributions

L.I. and T.H. performed the computations; E.S., K.S., F.F.L., B.H., J.B., and J.P. synthesized the computational designs; E.S., J.H.M.E., and J.A.M. characterized the computational designs; L.I., T.H., J.A.M., J.P., E.P., and D.M. analyzed and interpreted the data; D.M. conceived and supervised the study; L.I. and D.M. prepared the figures and wrote the manuscript with contributions from all authors.

Peer review

Peer review information

Nature Communications thanks Jurgen Bajorath, Giovanni Bottegoni and the other anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Data availability

All data generated in this study and supporting the results of this study are provided in the Supplementary Information and Source Data file. Source data are provided with this paper.

Code availability

Code used in this study is available at Zenodo (10.5281/zenodo.12795470)⁷⁵.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

The online version contains supplementary material available at 10.1038/s41467-024-52060-8.

References

1.Eckel, R. H., Grundy, S. M. & Zimmet, P. Z. The metabolic syndrome. Lancet365, 1415–1428 (2005). 10.1016/S0140-6736(05)66378-7 [DOI] [PubMed] [Google Scholar]
2.Grundy, S. M. Drug therapy of the metabolic syndrome: minimizing the emerging crisis in polypharmacy. Nat. Rev. Drug Discov.5, 295–309 (2006). 10.1038/nrd2005 [DOI] [PubMed] [Google Scholar]
3.Francque, S. et al. Nonalcoholic steatohepatitis: the role of peroxisome proliferator-activated receptors. Nat. Rev. Gastroenterol. Hepatol.18, 24–39 (2021). 10.1038/s41575-020-00366-5 [DOI] [PubMed] [Google Scholar]
4.Levy, M., Kolodziejczyk, A. A., Thaiss, C. A. & Elinav, E. Dysbiosis and the immune system. Nat. Rev. Immunol.17, 219–232 (2017). 10.1038/nri.2017.7 [DOI] [PubMed] [Google Scholar]
5.Hotamisligil, G. S. Inflammation, metaflammation and immunometabolic disorders. Nature542, 177–185 (2017). 10.1038/nature21363 [DOI] [PubMed] [Google Scholar]
6.Langenberg, C., Hingorani, A. D. & Whitty, C. J. M. Biological and functional multimorbidity—from mechanisms to management. Nat. Med.29, 1649–1657 (2023). 10.1038/s41591-023-02420-6 [DOI] [PubMed] [Google Scholar]
7.van Onna, M. & Boonen, A. Challenges in the management of older patients with inflammatory rheumatic diseases. Nat. Rev. Rheumatol.18, 326–334 (2022). 10.1038/s41584-022-00768-6 [DOI] [PubMed] [Google Scholar]
8.Proschak, E., Stark, H. & Merk, D. Polypharmacology by design: a medicinal chemist’s perspective on multitargeting compounds. J. Med. Chem.62, 420–444 (2019). 10.1021/acs.jmedchem.8b00760 [DOI] [PubMed] [Google Scholar]
9.Peters, J. U. Polypharmacology—foe or friend? J. Med. Chem.56, 8955–8971 (2013). 10.1021/jm400856t [DOI] [PubMed] [Google Scholar]
10.Vamathevan, J. et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov.18, 463–477 (2019). 10.1038/s41573-019-0024-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Skinnider, M. A., Stacey, R. G., Wishart, D. S. & Foster, L. J. Chemical language models enable navigation in sparsely populated chemical space. Nat. Mach. Intell.3, 759–770 (2021). 10.1038/s42256-021-00368-1 [DOI] [Google Scholar]
12.Grisoni, F. Chemical language models for de novo drug design: challenges and opportunities. Curr. Opin. Struct. Biol.79, 102527 (2023). 10.1016/j.sbi.2023.102527 [DOI] [PubMed] [Google Scholar]
13.Chen, H., Engkvist, O., Wang, Y., Olivecrona, M. & Blaschke, T. The rise of deep learning in drug discovery. Drug Discov. Today23, 1241–1250 (2018). 10.1016/j.drudis.2018.01.039 [DOI] [PubMed] [Google Scholar]
14.Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci.28, 31–36 (1988). 10.1021/ci00057a005 [DOI] [Google Scholar]
15.Segler, M. H. S., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focussed molecule libraries for drug discovery with recurrent neural networks. arXiv10.48550/arXiv.1701.01329 (2017). [DOI] [PMC free article] [PubMed]
16.Segler, M. H. S., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci.4, 120–131 (2018). 10.1021/acscentsci.7b00512 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci.4, 268–276 (2018). 10.1021/acscentsci.7b00572 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Gupta, A. et al. Generative recurrent networks for de novo drug design. Mol. Inform. 37, 1700111 (2018). [DOI] [PMC free article] [PubMed]
19.Merk, D., Friedrich, L., Grisoni, F. & Schneider, G. De novo design of bioactive small molecules by artificial intelligence. Mol. Inform.37, 1700153 (2018). 10.1002/minf.201700153 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Moret, M., Helmstädter, M., Grisoni, F., Schneider, G. & Merk, D. Beam search for automated design and scoring of novel ROR ligands with machine intelligence**. Angew. Chem. Int. Ed.60, 19477–19482 (2021). [DOI] [PMC free article] [PubMed]
21.Moret, M. et al. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design. Nat. Commun.14, 114 (2023). 10.1038/s41467-022-35692-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Ballarotto, M. et al. De novo design of Nurr1 agonists via fragment-augmented generative deep learning in low-data regime. J. Med. Chem.66, 8170–8177 (2023). 10.1021/acs.jmedchem.3c00485 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Moret, M., Friedrich, L., Grisoni, F., Merk, D. & Schneider, G. Generative molecular design in low data regimes. Nat. Mach. Intell.2, 171–180 (2020). 10.1038/s42256-020-0160-y [DOI] [Google Scholar]
24.Grisoni, F. & Schneider, G. De novo molecular design with chemical language models. In Methods in Molecular Biology (ed. Heifetz, A.) 2390, 207–232 (Humana Press Inc., 2022). [DOI] [PubMed]
25.Awale, M., Sirockin, F., Stiefl, N. & Reymond, J. L. Drug analogs from fragment-based long short-term memory generative neural networks. J. Chem. Inf. Model.59, 1347–1356 (2019). 10.1021/acs.jcim.8b00902 [DOI] [PubMed] [Google Scholar]
26.Kawai, T. et al. AT1 receptor signaling pathways in the cardiovascular system. Pharmacol. Res.125, 4–13 (2017). 10.1016/j.phrs.2017.05.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Sun, L., Cai, J. & Gonzalez, F. J. The role of farnesoid X receptor in metabolic diseases, and gastrointestinal and liver cancer. Nat. Rev. Gastroenterol. Hepatol.18, 335–347 (2021). 10.1038/s41575-020-00404-2 [DOI] [PubMed] [Google Scholar]
28.Kimura, I., Ichimura, A., Ohue-Kitano, R. & Igarashi, M. Free fatty acid receptors in health and disease. Physiol. Rev.100, 171–210 (2020). 10.1152/physrev.00041.2018 [DOI] [PubMed] [Google Scholar]
29.Montaigne, D., Butruille, L. & Staels, B. PPAR control of metabolism and cardiovascular functions. Nat. Rev. Cardiol.18, 809–823 (2021). 10.1038/s41569-021-00569-6 [DOI] [PubMed] [Google Scholar]
30.Saponaro, F., Sestito, S., Runfola, M., Rapposelli, S. & Chiellini, G. Selective thyroid hormone receptor-beta (TRβ) agonists: new perspectives for the treatment of metabolic and neurodegenerative disorders. Front. Med.7, 331 (2020). 10.3389/fmed.2020.00331 [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Imig, J. D., Cervenka, L. & Neckar, J. Epoxylipids and soluble epoxide hydrolase in heart diseases. Biochem. Pharmacol.195, 114866 (2022). 10.1016/j.bcp.2021.114866 [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Wagner, K. M., McReynolds, C. B., Schmidt, W. K. & Hammock, B. D. Soluble epoxide hydrolase as a therapeutic target for pain, inflammatory and neurodegenerative diseases. Pharmacol. Ther.180, 62–76 (2017). 10.1016/j.pharmthera.2017.06.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Cosentino, F. et al. 2019 ESC guidelines on diabetes, pre-diabetes, and cardiovascular diseases developed in collaboration with the EASD. Eur. Heart J.41, 255–323 (2020). 10.1093/eurheartj/ehz486 [DOI] [PubMed] [Google Scholar]
34.Morgan, H. L. The generation of a unique machine description for chemical structures—a technique developed at chemical abstracts service. J. Chem. Doc.5, 107–113 (1965). 10.1021/c160017a018 [DOI] [Google Scholar]
35.Reutlinger, M. et al. Chemically advanced template search (CATS) for scaffold-hopping and prospective target prediction for ‘orphan’ molecules. Mol. Inform.32, 133–138 (2013). 10.1002/minf.201200141 [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Gilson, M. K. et al. BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res.44, D1045–D1053 (2016). 10.1093/nar/gkv1072 [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Mendez, D. et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res.47, D930–D940 (2019). 10.1093/nar/gky1075 [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem.4, 90–98 (2012). 10.1038/nchem.1243 [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Keiser, M. J. et al. Relating protein pharmacology by ligand chemistry. Nat. Biotechnol.25, 197–206 (2007). 10.1038/nbt1284 [DOI] [PubMed] [Google Scholar]
40.Daina, A., Michielin, O. & Zoete, V. SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules. Sci. Rep.7, 42717 (2017). 10.1038/srep42717 [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Schmidt, J. et al. A dual modulator of farnesoid X receptor and soluble epoxide hydrolase to counter nonalcoholic steatohepatitis. J. Med. Chem.60, 7703–7724 (2017). 10.1021/acs.jmedchem.7b00398 [DOI] [PubMed] [Google Scholar]
42.Schierle, S. et al. Design and structural optimization of dual FXR/PPARδ activators. J. Med. Chem.63, 8369–8379 (2020). 10.1021/acs.jmedchem.0c00618 [DOI] [PubMed] [Google Scholar]
43.Arifi, S. et al. Targeting the alternative vitamin E metabolite binding site enables noncanonical PPARγ modulation. J. Am. Chem. Soc.145, 14802–14810 (2023). 10.1021/jacs.3c03417 [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Blöcher, R. et al. N-benzylbenzamides: a novel merged scaffold for orally available dual soluble epoxide hydrolase/peroxisome proliferator-activated receptor γ modulators. J. Med. Chem.59, 61–81 (2016). 10.1021/acs.jmedchem.5b01239 [DOI] [PubMed] [Google Scholar]
45.Raymond, J. W. RASCAL: calculation of graph similarity using maximum common edge subgraphs. Comput. J.45, 631–644 (2002). 10.1093/comjnl/45.6.631 [DOI] [Google Scholar]
46.Wang, M. et al. RELATION: a deep generative model for structure-based de novo drug design. J. Med. Chem.65, 9478–9492 (2022). 10.1021/acs.jmedchem.2c00732 [DOI] [PubMed] [Google Scholar]
47.Atz, K., Grisoni, F. & Schneider, G. Geometric deep learning on molecular representations. Nat. Mach. Intell.3, 1023–1032 (2021). 10.1038/s42256-021-00418-8 [DOI] [Google Scholar]
48.Skalic, M., Jiménez, J., Sabbadin, D. & De Fabritiis, G. Shape-based generative modeling for de novo drug design. J. Chem. Inf. Model.59, 1205–1214 (2019). 10.1021/acs.jcim.8b00706 [DOI] [PubMed] [Google Scholar]
49.Blaschke, T. et al. REINVENT 2.0: an AI tool for de novo drug design. J. Chem. Inf. Model.60, 5918–5922 (2020). 10.1021/acs.jcim.0c00915 [DOI] [PubMed] [Google Scholar]
50.Atz, K. et al. Prospective de novo drug design with deep interactome learning. Nat. Commun.15, 3408 (2024). 10.1038/s41467-024-47613-w [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Molecular Operating Environment (MOE). 2022.02 Chemical Computing Group ULC, 1010 Sherbooke St. West, Suite #910, Montreal, QC, Canada, H3A 2R7. (2022).
52.Öster, L., Tapani, S., Xue, Y. & Käck, H. Successful generation of structural information for fragment-based drug discovery. Drug Discov. Today20, 1104–1111 (2015). 10.1016/j.drudis.2015.04.005 [DOI] [PubMed] [Google Scholar]
53.Wang, N., Zou, Q., Xu, J., Zhang, J. & Liu, J. Ligand binding and heterodimerization with retinoid X receptor α (RXRα) induce farnesoid X receptor (FXR) conformational changes affecting coactivator binding. J. Biol. Chem.293, 18180–18191 (2018). 10.1074/jbc.RA118.004652 [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Ye, L. et al. Thyroid receptor ligands. 1. Agonist ligands selective for the thyroid receptor β1. J. Med. Chem.46, 1580–1588 (2003). 10.1021/jm021080f [DOI] [PubMed] [Google Scholar]
55.Zhang, H. et al. Structural basis for ligand recognition and functional selectivity at angiotensin receptor. J. Biol. Chem.290, 29127–29139 (2015). 10.1074/jbc.M115.689000 [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Srivastava, A. et al. High-resolution structure of the human GPR40 receptor bound to allosteric agonist TAK-875. Nature513, 124–127 (2014). 10.1038/nature13494 [DOI] [PubMed] [Google Scholar]
57.Sander, T., Freyss, J., Von Korff, M. & Rufener, C. DataWarrior: an open-source program for chemistry aware data visualization and analysis. J. Chem. Inf. Model.55, 460–473 (2015). 10.1021/ci500588j [DOI] [PubMed] [Google Scholar]
58.Tanaka, D. et al. A practical use of ligand efficiency indices out of the fragment-based approach: ligand efficiency-guided lead identification of soluble epoxide hydrolase inhibitors. J. Med. Chem.54, 851–857 (2011). 10.1021/jm101273e [DOI] [PubMed] [Google Scholar]
59.Amano, Y., Yamaguchi, T. & Tanabe, E. Structural insights into binding of inhibitors to soluble epoxide hydrolase gained by fragment screening and X-ray crystallography. Bioorg. Med. Chem.22, 2427–2434 (2014). 10.1016/j.bmc.2014.03.001 [DOI] [PubMed] [Google Scholar]
60.Akwabi-Ameyaw, A. et al. Conformationally constrained farnesoid X receptor (FXR) agonists: naphthoic acid-based analogs of GW 4064. Bioorg. Med. Chem. Lett.18, 4339–4343 (2008). 10.1016/j.bmcl.2008.06.073 [DOI] [PubMed] [Google Scholar]
61.Flatt, B. et al. Discovery of XL335 (WAY-362450), a highly potent, selective, and orally active agonist of the farnesoid X receptor (FXR). J. Med. Chem.52, 904–907 (2009). 10.1021/jm8014124 [DOI] [PubMed] [Google Scholar]
62.Hangeland, J. J. et al. Thyroid receptor ligands. Part 2: thyromimetics with improved selectivity for the thyroid hormone receptor beta. Bioorg. Med. Chem. Lett.14, 3549–3553 (2004). 10.1016/j.bmcl.2004.04.032 [DOI] [PubMed] [Google Scholar]
63.Yao, B. et al. Revealing a mutant-induced receptor allosteric mechanism for the thyroid hormone resistance. iScience20, 489–496 (2019). 10.1016/j.isci.2019.10.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
64.Batista, F. A. H. et al. Structural insights into human peroxisome proliferator activated receptor delta (PPAR-Delta) selective ligand binding. PLoS ONE7, e33643 (2012). 10.1371/journal.pone.0033643 [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Wu, C. C. et al. Structural basis for specific ligation of the peroxisome proliferator-activated receptor. Proc. Natl. Acad. Sci. USA114, E2563–E2570 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
66.Hernandez-Olmos, V. et al. Discovery of irbesartan derivatives as BLT2 agonists by virtual screening. ACS Med. Chem. Lett.12, 1261–1266 (2021). 10.1021/acsmedchemlett.1c00240 [DOI] [PMC free article] [PubMed] [Google Scholar]
67.Ehrler, J. H. M. et al. Compilation and evaluation of a fatty acid mimetics screening library. Biochem. Pharmacol.204, 115191 (2022). 10.1016/j.bcp.2022.115191 [DOI] [PubMed] [Google Scholar]
68.Hahn, S. et al. Complementary screening techniques yielded fragments that inhibit the phosphatase activity of soluble epoxide hydrolase. ChemMedChem6, 2146–2149 (2011). 10.1002/cmdc.201100433 [DOI] [PubMed] [Google Scholar]
69.Lukin, A. et al. Discovery of polar spirocyclic orally bioavailable urea inhibitors of soluble epoxide hydrolase. Bioorg. Chem.80, 655–667 (2018). 10.1016/j.bioorg.2018.07.014 [DOI] [PubMed] [Google Scholar]
70.Wolf, N. M., Morisseau, C., Jones, P. D., Hock, B. & Hammock, B. D. Development of a high-throughput screen for soluble epoxide hydrolase inhibition. Anal. Biochem.355, 71–80 (2006). 10.1016/j.ab.2006.04.045 [DOI] [PMC free article] [PubMed] [Google Scholar]
71.Gellrich, L. et al. l-Thyroxin and the nonclassical thyroid hormone TETRAC are potent activators of PPARγ. J. Med. Chem.63, 6727–6740 (2020). 10.1021/acs.jmedchem.9b02150 [DOI] [PubMed] [Google Scholar]
72.Rau, O. et al. Carnosic acid and carnosol, phenolic diterpene compounds of the labiate herbs rosemary and sage, are activators of the human peroxisome proliferator-activated receptor gamma. Planta Med.72, 881–887 (2006). 10.1055/s-2006-946680 [DOI] [PubMed] [Google Scholar]
73.Schmidt, J. et al. NSAIDs ibuprofen, indometacin, and diclofenac do not interact with farnesoid X receptor. Sci. Rep.5, 14782 (2015). 10.1038/srep14782 [DOI] [PMC free article] [PubMed] [Google Scholar]
74.Pauli, G. F. et al. Importance of purity evaluation and the potential of quantitative 1H NMR as a purity assay. J. Med. Chem.57, 9220–9231 (2014). 10.1021/jm500734a [DOI] [PMC free article] [PubMed] [Google Scholar]
75.Isigkeit, L. et al. Automated design of multi-target ligands by generative deep learning. Zenodo (2024). [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information^{(3.7MB, pdf)}

Peer Review File^{(456.3KB, pdf)}

Reporting Summary^{(978KB, pdf)}

Source Data^{(1.1MB, xlsx)}

Data Availability Statement

All data generated in this study and supporting the results of this study are provided in the Supplementary Information and Source Data file. Source data are provided with this paper.

Code used in this study is available at Zenodo (10.5281/zenodo.12795470)⁷⁵.

[CR1] 1.Eckel, R. H., Grundy, S. M. & Zimmet, P. Z. The metabolic syndrome. Lancet365, 1415–1428 (2005). 10.1016/S0140-6736(05)66378-7 [DOI] [PubMed] [Google Scholar]

[CR2] 2.Grundy, S. M. Drug therapy of the metabolic syndrome: minimizing the emerging crisis in polypharmacy. Nat. Rev. Drug Discov.5, 295–309 (2006). 10.1038/nrd2005 [DOI] [PubMed] [Google Scholar]

[CR3] 3.Francque, S. et al. Nonalcoholic steatohepatitis: the role of peroxisome proliferator-activated receptors. Nat. Rev. Gastroenterol. Hepatol.18, 24–39 (2021). 10.1038/s41575-020-00366-5 [DOI] [PubMed] [Google Scholar]

[CR4] 4.Levy, M., Kolodziejczyk, A. A., Thaiss, C. A. & Elinav, E. Dysbiosis and the immune system. Nat. Rev. Immunol.17, 219–232 (2017). 10.1038/nri.2017.7 [DOI] [PubMed] [Google Scholar]

[CR5] 5.Hotamisligil, G. S. Inflammation, metaflammation and immunometabolic disorders. Nature542, 177–185 (2017). 10.1038/nature21363 [DOI] [PubMed] [Google Scholar]

[CR6] 6.Langenberg, C., Hingorani, A. D. & Whitty, C. J. M. Biological and functional multimorbidity—from mechanisms to management. Nat. Med.29, 1649–1657 (2023). 10.1038/s41591-023-02420-6 [DOI] [PubMed] [Google Scholar]

[CR7] 7.van Onna, M. & Boonen, A. Challenges in the management of older patients with inflammatory rheumatic diseases. Nat. Rev. Rheumatol.18, 326–334 (2022). 10.1038/s41584-022-00768-6 [DOI] [PubMed] [Google Scholar]

[CR8] 8.Proschak, E., Stark, H. & Merk, D. Polypharmacology by design: a medicinal chemist’s perspective on multitargeting compounds. J. Med. Chem.62, 420–444 (2019). 10.1021/acs.jmedchem.8b00760 [DOI] [PubMed] [Google Scholar]

[CR9] 9.Peters, J. U. Polypharmacology—foe or friend? J. Med. Chem.56, 8955–8971 (2013). 10.1021/jm400856t [DOI] [PubMed] [Google Scholar]

[CR10] 10.Vamathevan, J. et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov.18, 463–477 (2019). 10.1038/s41573-019-0024-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Skinnider, M. A., Stacey, R. G., Wishart, D. S. & Foster, L. J. Chemical language models enable navigation in sparsely populated chemical space. Nat. Mach. Intell.3, 759–770 (2021). 10.1038/s42256-021-00368-1 [DOI] [Google Scholar]

[CR12] 12.Grisoni, F. Chemical language models for de novo drug design: challenges and opportunities. Curr. Opin. Struct. Biol.79, 102527 (2023). 10.1016/j.sbi.2023.102527 [DOI] [PubMed] [Google Scholar]

[CR13] 13.Chen, H., Engkvist, O., Wang, Y., Olivecrona, M. & Blaschke, T. The rise of deep learning in drug discovery. Drug Discov. Today23, 1241–1250 (2018). 10.1016/j.drudis.2018.01.039 [DOI] [PubMed] [Google Scholar]

[CR14] 14.Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci.28, 31–36 (1988). 10.1021/ci00057a005 [DOI] [Google Scholar]

[CR15] 15.Segler, M. H. S., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focussed molecule libraries for drug discovery with recurrent neural networks. arXiv10.48550/arXiv.1701.01329 (2017). [DOI] [PMC free article] [PubMed]

[CR16] 16.Segler, M. H. S., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci.4, 120–131 (2018). 10.1021/acscentsci.7b00512 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci.4, 268–276 (2018). 10.1021/acscentsci.7b00572 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Gupta, A. et al. Generative recurrent networks for de novo drug design. Mol. Inform. 37, 1700111 (2018). [DOI] [PMC free article] [PubMed]

[CR19] 19.Merk, D., Friedrich, L., Grisoni, F. & Schneider, G. De novo design of bioactive small molecules by artificial intelligence. Mol. Inform.37, 1700153 (2018). 10.1002/minf.201700153 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Moret, M., Helmstädter, M., Grisoni, F., Schneider, G. & Merk, D. Beam search for automated design and scoring of novel ROR ligands with machine intelligence**. Angew. Chem. Int. Ed.60, 19477–19482 (2021). [DOI] [PMC free article] [PubMed]

[CR21] 21.Moret, M. et al. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design. Nat. Commun.14, 114 (2023). 10.1038/s41467-022-35692-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Ballarotto, M. et al. De novo design of Nurr1 agonists via fragment-augmented generative deep learning in low-data regime. J. Med. Chem.66, 8170–8177 (2023). 10.1021/acs.jmedchem.3c00485 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Moret, M., Friedrich, L., Grisoni, F., Merk, D. & Schneider, G. Generative molecular design in low data regimes. Nat. Mach. Intell.2, 171–180 (2020). 10.1038/s42256-020-0160-y [DOI] [Google Scholar]

[CR24] 24.Grisoni, F. & Schneider, G. De novo molecular design with chemical language models. In Methods in Molecular Biology (ed. Heifetz, A.) 2390, 207–232 (Humana Press Inc., 2022). [DOI] [PubMed]

[CR25] 25.Awale, M., Sirockin, F., Stiefl, N. & Reymond, J. L. Drug analogs from fragment-based long short-term memory generative neural networks. J. Chem. Inf. Model.59, 1347–1356 (2019). 10.1021/acs.jcim.8b00902 [DOI] [PubMed] [Google Scholar]

[CR26] 26.Kawai, T. et al. AT1 receptor signaling pathways in the cardiovascular system. Pharmacol. Res.125, 4–13 (2017). 10.1016/j.phrs.2017.05.008 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Sun, L., Cai, J. & Gonzalez, F. J. The role of farnesoid X receptor in metabolic diseases, and gastrointestinal and liver cancer. Nat. Rev. Gastroenterol. Hepatol.18, 335–347 (2021). 10.1038/s41575-020-00404-2 [DOI] [PubMed] [Google Scholar]

[CR28] 28.Kimura, I., Ichimura, A., Ohue-Kitano, R. & Igarashi, M. Free fatty acid receptors in health and disease. Physiol. Rev.100, 171–210 (2020). 10.1152/physrev.00041.2018 [DOI] [PubMed] [Google Scholar]

[CR29] 29.Montaigne, D., Butruille, L. & Staels, B. PPAR control of metabolism and cardiovascular functions. Nat. Rev. Cardiol.18, 809–823 (2021). 10.1038/s41569-021-00569-6 [DOI] [PubMed] [Google Scholar]

[CR30] 30.Saponaro, F., Sestito, S., Runfola, M., Rapposelli, S. & Chiellini, G. Selective thyroid hormone receptor-beta (TRβ) agonists: new perspectives for the treatment of metabolic and neurodegenerative disorders. Front. Med.7, 331 (2020). 10.3389/fmed.2020.00331 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Imig, J. D., Cervenka, L. & Neckar, J. Epoxylipids and soluble epoxide hydrolase in heart diseases. Biochem. Pharmacol.195, 114866 (2022). 10.1016/j.bcp.2021.114866 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Wagner, K. M., McReynolds, C. B., Schmidt, W. K. & Hammock, B. D. Soluble epoxide hydrolase as a therapeutic target for pain, inflammatory and neurodegenerative diseases. Pharmacol. Ther.180, 62–76 (2017). 10.1016/j.pharmthera.2017.06.006 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Cosentino, F. et al. 2019 ESC guidelines on diabetes, pre-diabetes, and cardiovascular diseases developed in collaboration with the EASD. Eur. Heart J.41, 255–323 (2020). 10.1093/eurheartj/ehz486 [DOI] [PubMed] [Google Scholar]

[CR34] 34.Morgan, H. L. The generation of a unique machine description for chemical structures—a technique developed at chemical abstracts service. J. Chem. Doc.5, 107–113 (1965). 10.1021/c160017a018 [DOI] [Google Scholar]

[CR35] 35.Reutlinger, M. et al. Chemically advanced template search (CATS) for scaffold-hopping and prospective target prediction for ‘orphan’ molecules. Mol. Inform.32, 133–138 (2013). 10.1002/minf.201200141 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Gilson, M. K. et al. BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res.44, D1045–D1053 (2016). 10.1093/nar/gkv1072 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Mendez, D. et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res.47, D930–D940 (2019). 10.1093/nar/gky1075 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem.4, 90–98 (2012). 10.1038/nchem.1243 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Keiser, M. J. et al. Relating protein pharmacology by ligand chemistry. Nat. Biotechnol.25, 197–206 (2007). 10.1038/nbt1284 [DOI] [PubMed] [Google Scholar]

[CR40] 40.Daina, A., Michielin, O. & Zoete, V. SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules. Sci. Rep.7, 42717 (2017). 10.1038/srep42717 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Schmidt, J. et al. A dual modulator of farnesoid X receptor and soluble epoxide hydrolase to counter nonalcoholic steatohepatitis. J. Med. Chem.60, 7703–7724 (2017). 10.1021/acs.jmedchem.7b00398 [DOI] [PubMed] [Google Scholar]

[CR42] 42.Schierle, S. et al. Design and structural optimization of dual FXR/PPARδ activators. J. Med. Chem.63, 8369–8379 (2020). 10.1021/acs.jmedchem.0c00618 [DOI] [PubMed] [Google Scholar]

[CR43] 43.Arifi, S. et al. Targeting the alternative vitamin E metabolite binding site enables noncanonical PPARγ modulation. J. Am. Chem. Soc.145, 14802–14810 (2023). 10.1021/jacs.3c03417 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Blöcher, R. et al. N-benzylbenzamides: a novel merged scaffold for orally available dual soluble epoxide hydrolase/peroxisome proliferator-activated receptor γ modulators. J. Med. Chem.59, 61–81 (2016). 10.1021/acs.jmedchem.5b01239 [DOI] [PubMed] [Google Scholar]

[CR45] 45.Raymond, J. W. RASCAL: calculation of graph similarity using maximum common edge subgraphs. Comput. J.45, 631–644 (2002). 10.1093/comjnl/45.6.631 [DOI] [Google Scholar]

[CR46] 46.Wang, M. et al. RELATION: a deep generative model for structure-based de novo drug design. J. Med. Chem.65, 9478–9492 (2022). 10.1021/acs.jmedchem.2c00732 [DOI] [PubMed] [Google Scholar]

[CR47] 47.Atz, K., Grisoni, F. & Schneider, G. Geometric deep learning on molecular representations. Nat. Mach. Intell.3, 1023–1032 (2021). 10.1038/s42256-021-00418-8 [DOI] [Google Scholar]

[CR48] 48.Skalic, M., Jiménez, J., Sabbadin, D. & De Fabritiis, G. Shape-based generative modeling for de novo drug design. J. Chem. Inf. Model.59, 1205–1214 (2019). 10.1021/acs.jcim.8b00706 [DOI] [PubMed] [Google Scholar]

[CR49] 49.Blaschke, T. et al. REINVENT 2.0: an AI tool for de novo drug design. J. Chem. Inf. Model.60, 5918–5922 (2020). 10.1021/acs.jcim.0c00915 [DOI] [PubMed] [Google Scholar]

[CR50] 50.Atz, K. et al. Prospective de novo drug design with deep interactome learning. Nat. Commun.15, 3408 (2024). 10.1038/s41467-024-47613-w [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR51] 51.Molecular Operating Environment (MOE). 2022.02 Chemical Computing Group ULC, 1010 Sherbooke St. West, Suite #910, Montreal, QC, Canada, H3A 2R7. (2022).

[CR52] 52.Öster, L., Tapani, S., Xue, Y. & Käck, H. Successful generation of structural information for fragment-based drug discovery. Drug Discov. Today20, 1104–1111 (2015). 10.1016/j.drudis.2015.04.005 [DOI] [PubMed] [Google Scholar]

[CR53] 53.Wang, N., Zou, Q., Xu, J., Zhang, J. & Liu, J. Ligand binding and heterodimerization with retinoid X receptor α (RXRα) induce farnesoid X receptor (FXR) conformational changes affecting coactivator binding. J. Biol. Chem.293, 18180–18191 (2018). 10.1074/jbc.RA118.004652 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR54] 54.Ye, L. et al. Thyroid receptor ligands. 1. Agonist ligands selective for the thyroid receptor β1. J. Med. Chem.46, 1580–1588 (2003). 10.1021/jm021080f [DOI] [PubMed] [Google Scholar]

[CR55] 55.Zhang, H. et al. Structural basis for ligand recognition and functional selectivity at angiotensin receptor. J. Biol. Chem.290, 29127–29139 (2015). 10.1074/jbc.M115.689000 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR56] 56.Srivastava, A. et al. High-resolution structure of the human GPR40 receptor bound to allosteric agonist TAK-875. Nature513, 124–127 (2014). 10.1038/nature13494 [DOI] [PubMed] [Google Scholar]

[CR57] 57.Sander, T., Freyss, J., Von Korff, M. & Rufener, C. DataWarrior: an open-source program for chemistry aware data visualization and analysis. J. Chem. Inf. Model.55, 460–473 (2015). 10.1021/ci500588j [DOI] [PubMed] [Google Scholar]

[CR58] 58.Tanaka, D. et al. A practical use of ligand efficiency indices out of the fragment-based approach: ligand efficiency-guided lead identification of soluble epoxide hydrolase inhibitors. J. Med. Chem.54, 851–857 (2011). 10.1021/jm101273e [DOI] [PubMed] [Google Scholar]

[CR59] 59.Amano, Y., Yamaguchi, T. & Tanabe, E. Structural insights into binding of inhibitors to soluble epoxide hydrolase gained by fragment screening and X-ray crystallography. Bioorg. Med. Chem.22, 2427–2434 (2014). 10.1016/j.bmc.2014.03.001 [DOI] [PubMed] [Google Scholar]

[CR60] 60.Akwabi-Ameyaw, A. et al. Conformationally constrained farnesoid X receptor (FXR) agonists: naphthoic acid-based analogs of GW 4064. Bioorg. Med. Chem. Lett.18, 4339–4343 (2008). 10.1016/j.bmcl.2008.06.073 [DOI] [PubMed] [Google Scholar]

[CR61] 61.Flatt, B. et al. Discovery of XL335 (WAY-362450), a highly potent, selective, and orally active agonist of the farnesoid X receptor (FXR). J. Med. Chem.52, 904–907 (2009). 10.1021/jm8014124 [DOI] [PubMed] [Google Scholar]

[CR62] 62.Hangeland, J. J. et al. Thyroid receptor ligands. Part 2: thyromimetics with improved selectivity for the thyroid hormone receptor beta. Bioorg. Med. Chem. Lett.14, 3549–3553 (2004). 10.1016/j.bmcl.2004.04.032 [DOI] [PubMed] [Google Scholar]

[CR63] 63.Yao, B. et al. Revealing a mutant-induced receptor allosteric mechanism for the thyroid hormone resistance. iScience20, 489–496 (2019). 10.1016/j.isci.2019.10.002 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR64] 64.Batista, F. A. H. et al. Structural insights into human peroxisome proliferator activated receptor delta (PPAR-Delta) selective ligand binding. PLoS ONE7, e33643 (2012). 10.1371/journal.pone.0033643 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR65] 65.Wu, C. C. et al. Structural basis for specific ligation of the peroxisome proliferator-activated receptor. Proc. Natl. Acad. Sci. USA114, E2563–E2570 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR66] 66.Hernandez-Olmos, V. et al. Discovery of irbesartan derivatives as BLT2 agonists by virtual screening. ACS Med. Chem. Lett.12, 1261–1266 (2021). 10.1021/acsmedchemlett.1c00240 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR67] 67.Ehrler, J. H. M. et al. Compilation and evaluation of a fatty acid mimetics screening library. Biochem. Pharmacol.204, 115191 (2022). 10.1016/j.bcp.2022.115191 [DOI] [PubMed] [Google Scholar]

[CR68] 68.Hahn, S. et al. Complementary screening techniques yielded fragments that inhibit the phosphatase activity of soluble epoxide hydrolase. ChemMedChem6, 2146–2149 (2011). 10.1002/cmdc.201100433 [DOI] [PubMed] [Google Scholar]

[CR69] 69.Lukin, A. et al. Discovery of polar spirocyclic orally bioavailable urea inhibitors of soluble epoxide hydrolase. Bioorg. Chem.80, 655–667 (2018). 10.1016/j.bioorg.2018.07.014 [DOI] [PubMed] [Google Scholar]

[CR70] 70.Wolf, N. M., Morisseau, C., Jones, P. D., Hock, B. & Hammock, B. D. Development of a high-throughput screen for soluble epoxide hydrolase inhibition. Anal. Biochem.355, 71–80 (2006). 10.1016/j.ab.2006.04.045 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR71] 71.Gellrich, L. et al. l-Thyroxin and the nonclassical thyroid hormone TETRAC are potent activators of PPARγ. J. Med. Chem.63, 6727–6740 (2020). 10.1021/acs.jmedchem.9b02150 [DOI] [PubMed] [Google Scholar]

[CR72] 72.Rau, O. et al. Carnosic acid and carnosol, phenolic diterpene compounds of the labiate herbs rosemary and sage, are activators of the human peroxisome proliferator-activated receptor gamma. Planta Med.72, 881–887 (2006). 10.1055/s-2006-946680 [DOI] [PubMed] [Google Scholar]

[CR73] 73.Schmidt, J. et al. NSAIDs ibuprofen, indometacin, and diclofenac do not interact with farnesoid X receptor. Sci. Rep.5, 14782 (2015). 10.1038/srep14782 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR74] 74.Pauli, G. F. et al. Importance of purity evaluation and the potential of quantitative 1H NMR as a purity assay. J. Med. Chem.57, 9220–9231 (2014). 10.1021/jm500734a [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR75] 75.Isigkeit, L. et al. Automated design of multi-target ligands by generative deep learning. Zenodo (2024). [DOI] [PMC free article] [PubMed]

PERMALINK

Automated design of multi-target ligands by generative deep learning

Laura Isigkeit

Tim Hörmann

Espen Schallmayer

Katharina Scholz

Felix F Lillich

Johanna H M Ehrler

Benedikt Hufnagel

Jasmin Büchner

Julian A Marschner

Jörg Pabel

Ewgenij Proschak

Daniel Merk

Abstract

Introduction

Fig. 1. Metabolic syndrome and application of a chemical language model for multi-target de novo design.

Results

CLM capture molecular features of pooled ligand classes

Fig. 2. Analysis of known ligands for targets of interest.

Fig. 3. Fine-tuning of chemical language models (CLM) for multi-target design.

CLM designed dual ligands modulate both intended targets

Fig. 4. Synthesis of the dual ligand candidates 1–9 designed by the chemical language model (CLM).

Table. 1.

CLM designs fuse molecular and pharmacophore features of ligands for two targets

Fig. 5. Features of dual ligands designed by the chemical language models (CLM).

Fig. 6. Validation and comparison of the dual ligand design approach.

Table. 2.

Dual modulator design by CLM fails for target pairs with highly diverse ligands

Fig. 7. Designed dual ligand candidates for AT1/sEH (10), FFAR1/sEH (11) and FFAR1/AT1 (12) from chemical language models.

Table 3.

Discussion

Methods

Computational methods

Software

Data processing

Fine-tuning sets

CLM implementation

Training strategies

Beam search and epoch selection

Temperature sampling

Top 12 selection

Molecular docking

Molecular descriptors

Target prediction

Stochastic neighbor embedding

Pharmacophore modeling and search

In vitro assays

Cell culture for stably transfected CHO-K1 cell lines

sEH activity assay

Hybrid Gal4 reporter gene assays

Chemistry

General

Reporting summary

Supplementary information

Source data

Acknowledgements

Author contributions

Peer review

Peer review information

Funding

Data availability

Code availability

Competing interests

Footnotes

Supplementary information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Fig. 7. Designed dual ligand candidates for AT₁/sEH (10), FFAR1/sEH (11) and FFAR1/AT₁ (12) from chemical language models.