Skip to main content
ACS AuthorChoice logoLink to ACS AuthorChoice
. 2025 Jul 4;65(14):7734–7748. doi: 10.1021/acs.jcim.5c00907

Ultrahigh-Throughput Virtual Screening Strategies against PPI Targets: A Case Study of STAT Inhibitors

Tibor Viktor Szalai 1,2, Nikolett Péczka 1,3, Levente Sipos-Szabó 1,3, László Petri 1, Dávid Bajusz 1,*, György M Keserű 1,3,*
PMCID: PMC12308805  PMID: 40611790

Abstract

In recent years, virtual screening of ultralarge (108+) libraries of synthetically accessible compounds (uHTVS) became a popular approach in hit identification. With AI-assisted virtual screening workflows, such as Deep Docking, these protocols might be feasible even without supercomputers. Yet, these methodologies have their own conceptual limitations, including the fact that physics-based docking is replaced by a cheaper deep learning (DL) step for the vast majority of compounds. In turn, the performance of this DL step will highly depend on the performance of the underlying docking model that is used to evaluate parts of the whole data set to train the DL architecture itself. Here, we evaluated the performance of the popular Deep Docking workflow on compound libraries of different sizes, against benchmark cases of classic brute-force docking approaches conducted on smaller libraries. We were especially interested in more difficult, protein–protein interaction-type oncotargets where the reliability of the underlying docking model is harder to assess. Specifically, our virtual screens have resulted in several new inhibitors of two oncogenic transcription factors, STAT3 and STAT5b. For STAT5b, in particular, we disclose the first application of virtual screening against its N-terminal domain, whose importance was recognized more recently. While the AI-based uHTVS is computationally more demanding, it can achieve exceptionally good hit rates (50.0% for STAT3). Deep Docking can also work well with a compound library containing only several million (instead of several billion) compounds, achieving a 42.9% hit rate against the SH2 domain of STAT5b, while presenting a highly economic workflow with just under 120,000 compounds actually docked.


graphic file with name ci5c00907_0008.jpg


graphic file with name ci5c00907_0007.jpg

Introduction

Searching “synthetically accessible” or “make-on-demand” ultralarge (108+ compounds) chemical databases , provides a unique opportunity to sample the corresponding chemical space effectively. Ultrahigh-throughput virtual screening (uHTVS) methods designed for these applications leverage the much increased computational demand with a logical layer, such as a deep learning model or a bottom-up (e.g., fragment- or synthon-based) generative approach. Contemporary, large chemical libraries of compound vendors and aggregators like Enamine, Mcule, and eMolecules now routinely offer billions (or even trillions) of synthesizable virtual compounds. As the “brute-force” evaluation (docking every compound in the library) of these libraries is seriously demanding (or, without the proper infrastructure, completely unfeasible), efforts were made to reduce the computational cost of uHTVS without losing a significant amount of true hits. These efforts include enhancing the speed of docking by using GPUs and input-output optimization or using integrated workflows that reduce the number of actually docked compounds (and thereby, the computational cost) by using iterative machine learning (ML) to generate a deep neural network model or a synthon-based concept for combinatorial library generation. The uHTVS method might increase the chance of identifying new, potent inhibitors. This is particularly advantageous for targeting protein–protein interactions (PPIs), which is widely considered to be more challenging due to the lack of a single, deep, and well-defined binding cavity. At the same time, the performance of AI-based uHTVS methods is highly dependent on the performance of the underlying docking model (that is used to train the deep learning model itself), which is considered to be worse for PPI targets for the very same reason. Therefore, we intended to test the performance of contemporary uHTVS methods, particularly Deep Docking, on STAT3 and STAT5b, which are a pair of relevant PPI-type oncotargets. A traditional workflow of "brute-force" docking with smaller compound libraries is performed as a benchmark, and finally, the most cost-effective setup is applied to discover new inhibitors of the N-terminal domain of STAT5b, which was recently established as a promising pharmaceutical target.

Signal transducer and activator of transcription (STAT) proteins are a family of transcription factors with key roles in cytokine signaling, growth factor stimulation, and DNA transcription activation. There are seven identified members of the STAT protein family (STAT1, STAT2, STAT3, STAT4, STAT5a, STAT5b, and STAT6), each having a unique role in cytokine signaling. When the physiological significance of each STAT protein in ‘knockout’ mice in vivo was investigated, the absence of STAT1 or STAT2 proteins caused a lack of immune response to interferons, making the living organism more susceptible to infections. Absence of STAT3 could not be investigated in mice, as it caused embryonic lethality; , however, a different study discussed the role of STAT3 in the migration of keratinocyte cell migration in skin, in the activation of T cells by interleukin-6, in the signaling of macrophages, and in the apoptosis of mammary gland cells. STAT4 and STAT6 proteins have roles in T lymphocyte development by activation through some types of cytokines (particularly interleukin-4, interleukin-12, and interleukin-13), and the lack of these proteins resulted in the living organism being unresponsive to these cytokines. Absence of STAT5a resulted in impaired mammary gland development and lactogenesis and an impairment in peripheral T lymphocyte proliferation while absence of STAT5b resulted in a major loss of multiple, sexual dimorphisms.

Although STAT proteins have different roles, they have a highly conserved, modular structure with six domains. The N-terminal domain (NTD) is used for higher-order homo- and heterodimerization, the coiled-coil domain (CCD) interacts with other proteins which participate in nuclear import and export, the DNA binding domain (DBD) is used to identify and bind to the target gene palindrome sequence, the linker domain (LD) participates in the phosphorylation of the protein, the Src Homology 2 (SH2) domain is used to identify phosphotyrosine sites and regulates the protein–protein interactions (PPIs), stabilizing STAT homodimer formation through phosphotyrosine-SH2 interactions, and last the transcription activation domain (TAD) found at the C-terminal end contains the tyrosine and serine phosphorylation sites required for gene transcription activation. Out of the six domains, the most commonly targeted is the SH2 domain for its well-defined phosphotyrosine-binding (pY) site with a conserved arginine residue and its role in downstream signaling.

STAT proteins, mainly STAT1, STAT3, and STAT5b, exhibit oncological properties. ,,, STAT1 has a role as an inhibitor of cell proliferation and assists apoptosis, acting as a tumor suppressor, and its absence caused mice to be more susceptible to carcinogen-induced tumors. STAT3 and STAT5b proteins are necessary for cancer cell formation and cell survival, and their mutations and overexpression can initiate cancer-related processes. STAT3 is associated with many types of cancer, including leukemias, melanoma, and prostate cancer, ,,, while STAT5b protein is associated with breast cancer, colorectal cancer, lung cancer, prostate cancer, and leukemias. , STAT3 and STAT5b are key oncological targets, as their inhibition causes cancer-derived cells to undergo growth arrest or apoptosis, while healthy cells are mainly unaffected. ,

Identifying potent small-molecule STAT inhibitors is challenging due to its large, solvent-exposed PPI interface. Virtual screening is a promising method to identify potent STAT inhibitors, as it is a cost-effective alternative to experimental high-throughput screening, and it can achieve much higher hit rates. , To assist the identification of potent inhibitors, chemical databases targeting a specific domain have been compiled, like the SH2 Domain Targeted Library of OTAVAchemicals, which contains drug-like compounds with a predicted affinity to SH2 domains based on generic pharmacophore patterns. Due to the increased 3D-likeness and complexity of natural products, screening such libraries represents another knowledge-based option to identify hits against PPI targets. ,

In this study, we have examined the performance of the Deep Docking workflow against the STAT3 SH2 domain and benchmarked it against a virtual screen of a diversity-picked subset of the Mcule-in-stock data set (with its size matching the number of compounds docked in the Deep Docking workflow itself). Additionally, we have performed ‘traditional’, brute-force virtual screens against two smaller libraries (OTAVAchemicals SH2 Domain Targeted Library and a natural product library) as an alternative, ‘knowledge-based’ approach. The AI-based tool Deep Docking was used to recover virtual hits from the Enamine REAL library and the Mcule-in-stock library. The aims of this study were to examine whether uHTVS with an AI-based tool can reach significantly better hit rates than a ‘knowledge-based’ approach, and also to examine if AI-based tools can perform well even after training the deep learning model with less than 100,000 (instead of millions) compounds (economic screening workflow). Against STAT3-SH2, we have found Deep Docking to be capable of reaching exceptional hit rates, as high as 50.0%. Additionally, it can also perform well with the smaller Mcule-in-stock library containing ‘only’ millions (instead of billions) of compounds. Finally, the performance of this economic screening workflow was confirmed in prospective studies, identifying ligands of STAT5b SH2 and N-terminal domains. In addition to identifying new STAT inhibitors, this study highlights the effectiveness of uHTVS with an AI-based tool and provides a benchmark study for using AI-based tools with a smaller compound library.

Methods

Data Sets for the ‘Knowledge-Based’ Approach

Two data sets were examined for the ‘knowledge-based’ approach: the first containing 1,807 compounds specifically collected by generating and clustering pharmacophore models interacting with the SH2 domain (OtavaSH2 data set), the second containing 193,757 naturally occurring, or natural product-like compounds (NP data set) compiled from several vendors: LifeChemicals, ChemBridge, Asinex, and ChemDiv. Pan-assay interference compounds (PAINS compounds) were filtered out from these data sets in advance.

Data Sets for the AI-Based Approach

Two data sets were examined for the AI-based approach. The Enamine REAL data set contains 5.51 billion compounds, each complying with Lipinski’s rule of five and the Veber criteria and each compound being synthetically accessible. The Mcule-in-stock data set contains 5.59 million compounds, each being in stock and purchasable from Mcule. The Benchmark data set contained 117,500 chemically diverse compounds, selected from the Mcule-in-stock data set with the RDKit Diversity Picker node in KNIME. Just as for the data sets used in the ‘knowledge-based’ approach, the only filtering made in advance was to filter out PAINS compounds.

Docking and Virtual Screening

STAT3 SH2 Domain

The appropriate X-ray structure for docking was selected by carrying out a retrospective virtual screening with a small data set containing 69 known actives for STAT3 from ChEMBL and 959 decoy molecules generated with the DUD-E database and evaluating the performance metrics AUC (Area Under the ROC Curve) and 1%, 2%, and 5% EF (Enrichment Factor) values based on the docking scores for each structure and docking setting. Preparation of the data set was carried out with LigPrep at a pH range of 7.4 ± 1.0, and the docking was performed with Glide , single precision (SP) mode using different H-bond constraint settings inside the SH2 domain. Based on the performance metrics, the X-ray structure with the PDB ID 6QHD was used with an H-bond constraint (with Glide) with the R609 residue for all virtual screening runs against STAT3 SH2 domain. R609 has been shown earlier to function as a key anchoring residue of the SH2 domain in STAT3. Performance metrics such as AUC and enrichment factor values (EF) for 6QHD are included in Table , while the ROC curve is included in Supporting Information Figure S1. The training set of known actives and decoys are included in Supplementary Data.

1. Performance Metrics for 6QHD.
Performance metric Value
AUC 0.887
EF(1%) 5.005
EF(2%) 6.674
EF(5%) 7.341

Ligand preparation for the OtavaSH2, NP, and Mcule-in-stock data sets were also carried out with LigPrep at a pH of 7.4 ± 1.0, and the docking calculations were done using Glide in SP mode. In the case of Enamine REAL, ligand preparation was carried out with the open-source programs Dimorphite-DL and RDKit, grid generation was carried out with AutoDockTools and AutoGrid, and the docking was carried out using AutoDockGPU. Despite its generally worse performance to Glide, AutoDockGPU was chosen in this case as an open-source alternative to significantly speed up the docking steps for the much larger Enamine REAL data set, utilizing GPU acceleration on an HPC cluster. As AutoGrid does not have the capacity to set H-bond constraints, docking with AutoDockGPU was performed without constraints.

Additionally, Deep Docking was also used with the data sets used for the AI-based approach, coupled with the appropriate docking algorithm (AutoDockGPU for Enamine REAL, Glide for Mcule-in-stock). Each Deep Docking workflow consisted of 11 iteration cycles, each with 7,500 compounds to be docked (in the first iteration, 3 × 7,500 compounds are docked for training, test, and validation sets) for the Mcule-in-stock data set, and each with 1,000,000 compounds to be docked (3 × 1,000,000 in the first iteration) for the Enamine REAL data set. For all Deep Docking runs, the number of hyperparameters was set to 12, the training time was set to a maximum of four hours, and the recall value was set to 0.9, i.e. the default choices described in the original paper. Deep Docking workflows were carried out using an in-house script that was published on our Github page (https://github.com/keserulab/uHTVS_toolkit) as part of our recent work.

STAT5b SH2 Domain

The X-ray structure with the PDB ID 6MBW was used for docking purposes as it was the only wild-type STAT5b X-ray structure available at the time. The Mcule-in-stock data set was prepared as described before. We note here that the Enamine REAL data set was not used for compound selection against the STAT5b SH2 domain as we deemed the Mcule-in-stock data set sufficient for the AI-based approach, based on our experience with the STAT3 SH2 domain.

For virtual screening, an H-bond constraint with the R618 residue was set as the analogue for the R609 residue in the STAT3 SH2 domain. To evaluate the goodness of the used receptor grid, a retrospective virtual screening using a data set containing 28 known actives for STAT5b found in the literature and 1650 decoy molecules generated with DUD-E was used, and the performance metrics AUC and EF values belonging to the best 1%, 2%, and 5% of compounds based on docking score were evaluated. Virtual screening settings for the AI-based approach were the same as described for the STAT3 SH2 domain. Performance metrics such as AUC and EF values for 6MBW are included in Table , while the ROC curve is included in Supporting Information Figure S2. The training set of known actives and decoys are included in Supplementary Data.

2. Performance Metrics for the 6MBW.
Performance metric Value
AUC 0.850
EF(1%) 25.98
EF(2%) 12.60
EF(5%) 9.302

STAT5b NTD

As described earlier, the STAT5b NTD is a novel, promising target for STAT5b inhibition, as it can inhibit the higher-order oligomerization processes of the STAT5b protein. As the information about this target is scarce, a ‘knowledge-based’ approach was not carried out for this target, only an AI-based approach utilizing the experiences gathered during the SH2 domain case studies, i.e., using the Mcule-in-stock data set and Glide SP as the docking algorithm. In lack of a resolved experimental structure for the N-terminal domain at the time, the protein structure used for the virtual screening tasks was a homology model generated using the Prime protein modeling program based on the STAT3 N-terminal domain structure (PDB: 4ZIA).

Preparing the Mcule-in-stock data set was carried out the same way as described earlier. A docking grid was generated to include the handshake dimer interface, and no constraint was set. For the Deep Docking workflow, 22,500 compounds were docked in each iteration (3 × 22,500 in the first iteration), with the other Deep Docking settings kept as described earlier.

Overall Workflow

Figure shows an overview of the used workflows, targets, and data sets in the current work. After the performance evaluation of the used structures for STAT3 and STAT5b (described earlier), first the virtual screening runs against STAT3 were carried out, and they were evaluated based on hit rates and the used resources. The AI-based workflow with the Mcule-in-stock data set was found as a good balance between performance and resource usage, and it was shown to convincingly outperform the benchmark VS workflow that consisted of docking the same number of diversity-picked compounds from the same data set (i.e., without AI augmentation). Thus, the Deep Docking workflow was applied prospectively for the virtual screening runs first against the STAT5b SH2 domain and then against the novel target STAT5b NTD.

1.

1

Overview of the applied ‘knowledge-based’ and AI-based virtual screening workflows against the appropriate targets. Green and blue colorings highlight the ‘knowledge-based’ and AI-based approaches, respectively. (The orange data set represents a benchmark against the AI-based STAT3/Mcule-in-stock virtual screen, where the Deep Docking workflow is replaced by a simple diversity selection.).

Compound Selection

Compound selection was performed using the output from each virtual screening workflow. In the case of the ‘knowledge-based’ approach, this corresponds to the output acquired from the docking of the whole data sets. For the AI-based approach, first the top 20,000 (Mcule-in-stock) or 55,000 (Enamine REAL) predicted virtual hits based on their Virtual-Hit Likeness value (VHL value or also termed as p-value within the workflow) after the last iteration of the Deep Docking workflow were exported, then they were docked with Glide SP with an appropriate H-bond constraint, and the output from these dockings was used. Glide SP docking with an appropriate H-bond constraint was used for compound selection also for the Enamine REAL data set, based on our recent findings regarding the better pose prediction performance of Glide vs AutoDockGPU.

Compound selection was started by ranking the compounds based on (i) docking scores and (ii) docking score/heavy atom count (DS/HA) values. After the ranking, the binding modes were visually inspected until the desired number of compounds was selected, starting with the best (most negative) docking score or DS/HA values and working our way up. A total of 30 compounds was selected in the case of the OtavaSH2 data set, 20 in the case of the NP data set, 20 in the case of the Benchmark data set, 20 in the case of the Mcule-in-stock data set, and 10 in the case of the Enamine REAL data set, with half and half of the compounds being selected based on docking score (docking score ≤ −5.5) and DS/HA values (DS/HAN ≤ −0.2), respectively. We note here that the number of received and measured compounds might differ from these numbers due to unavailability at the moment of purchasing, poor solubility, or unsuccessful synthesis in the case of the Enamine REAL data set.

In general, small molecules were selected by visually inspecting their binding pose, while placing great emphasis on chemical diversity. Specifically, several structural and binding mode aspects were defined as guidance for compounds to be considered for purchasing and measurement. In terms of binding pose, a compound was eligible for being a virtual hit, if it interacted with the key anchoring residue defined in the appropriate H-bond constraint (R609 for STAT3 SH2 domain, R618 for STAT5b SH2 domain), AND it had at least a total of two advantageous interactions with the protein (in addition to the anchoring residue), AND did not have a significant part of it in a solvent-exposed region (no contact with the protein). Structurally, a compound was considered if it had more than ten but less than 50 heavy atoms, had less than four amide bonds, and had a maximum of two carboxylic groups.

To cover as large a chemical space as possible for prospective screening, if two compounds had very similar structures (e.g., only differing by a single heavy atom or a single functional group), only one was purchased, with the lower docking score value being favored. (The only exceptions here were compounds S5_M1 and S5_M7 that only differed by a single hydroxylic group but displayed excellent DS/HA values of −0.417 and −0.393, respectively, and were both purchased and tested).

Benchmarks and Evaluation

To benchmark the relevant docking score and DS/HA distributions, we have docked 20,000 (Mcule-in-stock) or 55,000 (Enamine REAL) randomly selected compounds from the data sets with Glide SP: these sets constituted the basis of comparing the docking score distributions (see Discussion).

Additionally, to provide a more general benchmark case against the whole Deep Docking workflow, we have used the RDKit Diversity Picker node in KNIME, to select 117,500 chemically diverse compounds from the Mcule-in-stock database (Benchmark data set), which is the exact same number that is docked in total during the Deep Docking workflow. Then, we docked these compounds with Glide SP against the STAT3 SH2 domain and selected 15 compounds for measurements, to have a clear baseline to evaluate the Deep Docking workflow against. (The docking and LigPrep parameters for the benchmark approaches were the same as described earlier.)

In addition, to evaluate the performance of each Deep Docking run, the benchmarking statistics per iteration were also evaluated, including recall, precision, and ROC-AUC values. Recall and precision are defined as follows:

Precision=TruePositives(TP)TruePositives(TP)+FalsePositives(FP)Recall=TruePositives(TP)TruePositives(TP)+FalseNegatives(FN)

These benchmarking statistics were plotted in Supporting Information Figure S6.

Fluorescence Polarization Assay

The respective protocols for the expression and purification of the STAT3 SH2 and STAT5b SH2 domains were published in our recent works. Fluorescence polarization assays (FP-assays) were performed on a Molecular Devices SpectraMax iD5Multimode Microplate Reader (San Jose, CA, USA) using Greiner black 384-well flat-bottom nonbinding microplates with 40 μL final well volumes. The fluorescent peptides (5-FAM-G­(pTyr)­LPQTV-NH2 and 5-FAM-G­(pTyr)­LVLDKW-NH2, purchased from GenScript Biotech Ltd., Piscataway, NJ, USA), as well as the proteins, were diluted with a buffer containing 50 mM NaCl, 10 mM HEPES (4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid), 1 mM EDTA (ethylenediaminetetraacetic acid), 2 mM DTT (dithiothreitol), and 0.1% Triton X-100, pH 7.5. For STAT3 (127–770), the final concentration of the protein was 200 nM, and the fluorescent peptide (5-FAM-G­(pTyr)­LPQTV-NH2) was added at a final concentration of 5 nM; for STAT5b (136–703), the final concentration of the protein was 500 nM, and the fluorescent peptide (5-FAM-G­(pTyr)­LVLDKW-NH2) was added at a final concentration of 10 nM. The wells were treated with varying concentrations of inhibitors (1250 to 0.11 μM final concentrations). The final DMSO content was 2.5%. After mixing, the protein and inhibitors were incubated at room temperature for 30 min, then the fluorescent peptide was added, and the plate was incubated for another 20 min prior to the fluorescence readout (extinction wavelength: 475 nm, emission wavelength: 520 nm). The measurements were carried out in at least three biological replicates. Fluorescence polarization was calculated from the perpendicular and parallel fluorescence intensities and then plotted against concentration. The IC50 values were determined after fitting quadratic dose–response curves on the data points in GraphPad Prism 8.0.1. A compound was identified as active, if it had a mean IC50 value under 100 μM. All plotted curves and data points for the hit compounds are included in Section 2 of the Supporting Information.

Isothermal Titration Calorimetry

The detailed protocol for the expression and purification of the STAT5b-NTD was described in our recent work. ITC measurements were carried out on a MicroCal PEAQ-ITC microcalorimeter (Malvern Instruments, Worcestershire, UK). All protein and ligand solutions were composed of 1X PBS (pH 7.4) buffer containing 8% (V/V) DMSO to avoid buffer mismatch. Protein solutions containing 25–26 μM protein were prepared in batches prior to measurements by diluting 676 μM stock solution (stored in 1X PBS (pH 7.4) buffer at −80 °C) warmed to room temperature with 1X PBS (pH 7.4) buffer solution and an appropriate amount of DMSO. Ligand solutions were prepared by first dissolving the ligands in DMSO to acquire a 50 mM stock solution, which was then diluted to 1 mM (in the case of S5N_M2 ligand) or 4 mM (all other ligands) concentration using 1X PBS (pH 7.4) buffer solution. Appropriate amounts of DMSO were also added to the solution containing 1 mM ligand to yield a 8% (V/V) DMSO content. The protein solution was loaded into the sample cell in a volume of 200 μL and was titrated at 25 °C with the ligand solution at a stirring speed of 1000 rpm and a reference heat rate of 10 μcal/s in a high feedback mode. Used initial delay was 120 s, and the injection program was composed of an initial first injection with 0.5 μL over 2 s, followed by 18 injections with 2 μL over 4 s. Injections were performed every 180 s. Blank measurements consisting of titrating the ligand into the buffer (1X PBS (pH 7.4) containing 8% (V/V) DMSO) were carried out to correct for the heat of dilution for all ligands using the appropriate injection program. Data were analyzed using the MicroCal PEAQ-ITC analysis software (version 1.22) by fitting a single-site binding curve. A compound was identified as active, if it had a mean Kd value under 100 μM. Binding curves of the identified active compounds are included in Supporting Information Figures S3–S5.

Results

The active form of the STAT proteins is the phosphorylated dimer form. ,, The direct method to inhibit STAT protein dimerization includes direct small molecule binding to the SH2 domain or, as a new approach, the NTD. , Out of the two target domains, the SH2 domain is the most often targeted, with the NTD being a novel target for inhibiting STAT activation. Numerous potent small molecules have been already published for the SH2 domain of STAT3 ,− and, to a lesser extent, STAT5b; − , however, by this date, there is still no clinically approved, direct STAT inhibitor.

Leveraging the available structural and biochemical information against the various STAT3 and STAT5b protein domains, we have utilized the thoroughly studied STAT3 SH2 domain as a primary benchmark of the DeepDocking workflow vs “traditional” (or “knowledge-based”) workflows, one of which applies “brute-force” docking of smaller libraries, the other using a diverse selection from a larger data set to produce a smaller, chemically diverse data set. Learning from our experiences here, we moved on to discover new inhibitors of more challenging PPI oncotargets: the less thoroughly explored STAT5b-SH2 and the completely novel STAT5b-NTD.

Benchmarking and Method Comparison against the STAT3 SH2 Domain

‘Knowledge-Based’ Approach

Virtual screening of the OtavaSH2 data set resulted in 27 compounds, virtual screening of the NP data set resulted in 17 compounds, and last, virtual screening of the Benchmark resulted in 15 compounds to be measured by the FP assay against the STAT3 SH2 domain. For the structures of all chosen compounds, as well as their docking scores, DS/HA, and IC50 values, please refer to the Supplementary Data. The FP-assay measurements identified one active compound from the OtavaSH2 data set, S3_O10 (Figure ), corresponding to a hit rate of 3.7% (1/27), while no active compound was identified from either the NP data set or the Benchmark data set. The hit compound contains a carboxylic group which interacts with the R609 residue, and it also contains a coumarin motif. A weakly binding compound (mean IC50 value higher than 100 μM) from the Benchmark data set (S3_B12) was also identified (included in the Supplementary Data).

2.

2

(a) Structures of the identified active from the OtavaSH2 data set against the STAT3 SH2 domain, with its corresponding mean IC50 value. (b) Binding mode of S3_O10 against the SH2 domain of STAT3 (PDB ID: 6QHD).

AI-Based Approach

The Deep Docking-assisted virtual screening of the Mcule-in-stock and Enamine REAL data sets resulted in 16 and eight compounds, respectively, to be ordered for FP-assay measurements against the STAT3 SH2 domain. For the structures of all chosen compounds, as well as their docking scores, DS/HA, and IC50 values please refer to the Supplementary Data. The FP-assay measurements identified three active compounds, S3_M2, S3_M5, and S3_M13 from the Mcule-in-stock data set, and four active compounds, S3_E4, S3_E5, S3_E6, and S3_E7 from the Enamine REAL data set (Figure ), corresponding to hit rates of 19% (3/16) and 50% (4/8), respectively. One weakly binding compound from the Mcule-in-stock data set (S3_M15) and two weakly binding compounds from the Enamine REAL data set (S3_E3 and S3_E8) were also identified (included in Supplementary Data).

3.

3

(a) Structures of the identified actives from the Mcule-in-stock data set against STAT3 SH2 domain, with their corresponding mean IC50 values. (b) Binding mode of the most potent active compound from the Mcule-in-stock data set, S3_M5, against the SH2 domain of STAT3 (PDB ID: 6QHD). (c) Structures of the identified actives from the Enamine REAL data set against STAT3 SH2 domain, with their corresponding IC50 values. (d) Binding mode of the most potent active compound from the Enamine REAL data set, S3_E4, against the SH2 domain of STAT3 (PDB ID: 6QHD).

The identified active compounds from the Mcule-in-stock all contain carboxylic acid groups (one or two) which interact with the R609 residue, resulting in their stronger affinity toward STAT3.

The two most potent active compounds from the Enamine REAL, S3_E4 and S3_E6, both have IC50 values in the low/mid micromolar range and both contain a 1,3-benzodioxole motif. These compounds have a very similar binding pose, with the 1,3-benzodioxole interacting with the R595 residue, which is another key residue for inhibition alongside R609, and should explain their higher affinity.

We note here that the activities detected for the hit compounds described in this work are consistent with the IC50/Ki range of 0.52–114 μM for reported STAT3 inhibitors that were discovered by virtual screening. (Nonetheless, we have used the Aggregator Advisor tool to double-check the hits for possible artifacts: the script has not flagged any of the compounds as being similar to known aggregators.)

The Economic Screening Workflow Identifies New STAT5b SH2 Domain Binders

The results from the AI-based approach for the STAT3 SH2 domain showed that the Deep Docking assisted virtual screening of the Mcule-in-stock data set can produce sufficiently good results. To further analyze the performance of Deep Docking with a data set containing only millions of compounds, 14 compounds from only the Mcule-in-stock data set were ordered for FP-assay measurements against the STAT5b SH2 domain. For the structures of all chosen compounds, as well as their docking scores, DS/HA, and IC50 values, please refer to the Supplementary Data. The FP-assay measurements identified six active compounds, S5_M1, S5_M2, S5_M7, S5_M8, S5_M9, and S5_M13, from the Mcule-in-stock data set (Figure ), corresponding to a hit rate of 42.9% (6/14). Out of the six identified actives, S5_M1 and S5_M7 are structurally similar compounds differing in one hydroxylic group interacting with the protein backbone at the L643 residue, which might explain its higher activity. One weakly binding compound (S5_M6) was also identified (included in Supplementary Data).

4.

4

(a) Structures of the identified actives from the Mcule-in-stock data set against STAT5b SH2 domain, with their corresponding mean IC50 values. (b) Binding mode of the most potent active compound from the Mcule-in-stock data set, S5_M14, against the SH2 domain of STAT5b (PDB ID: 6MBW).

Compound Activity Analysis

Just as for STAT3, we compared the activity of newly identified inhibitors against documented ones in the case of STAT5b. The ChEMBL database contains significantly less inhibitors against STAT5b with less than 700 Da molecular weight nd a reported IC50 value (15 compounds) than against STAT3 (235 compounds). Further analysis shows that, out of those 15 compounds, 12 compounds are from one series of analogs that went through chemical optimization with mean IC50 values ranging from 154 to 1400 nM, while the remaining three are from another series of analogs with mean IC50 values ranging from 1.4 to 37 μM. In our earlier work, we also collected STAT5b inhibitors identified via virtual screening, with IC50 or Ki values ranging from 9 nM to 80 μM.

From the available data, we can conclude that the activity of the identified inhibitors in the current work fits into the activity range of already identified inhibitors, although they are on the weaker side. As described earlier in the case of STAT3, our main goal was to identify structurally diverse chemical starting points, and for that goal, these newly identified compounds are suitable for further chemical optimization against STAT5b. The small number of documented inhibitors for this target also makes any newly identified actives valuable.

To check if any of the identified compounds tend to aggregate, Aggregator Advisor was used. The script resulted in no measured compounds being similar to known aggregators.

Prospective Use of the Economic AI-Based Workflow Identifies Dimerization Inhibitors of the Novel Oncotarget STAT5b N-Terminal Domain

Based on the excellent performance of the economic Deep Docking workflow in the two previous scenarios, we have applied this approach once again to discover dimerization inhibitors of the recently described oncotarget, the N-terminal domain of STAT5b. The Deep Docking-assisted virtual screening of the Mcule-in-stock data set resulted in 14 compounds to be ordered for ITC (isothermal titration calorimetry) measurements against the STAT5b NTD. For the structures of all chosen compounds, as well as their docking scores, DS/HA, and Kd values, please refer to the Supplementary Data. The ITC measurements identified three active compounds, S5N_M2, S5N_M4, and S5N_M6 from the Mcule-in-stock data set (Figure ), corresponding to a hit rate of 21.4% (3/14). Two weakly binding (mean Kd is higher than 100 μM) compounds (S5N_M1 and S5N_M7) were also identified (included in Supplementary Data).

5.

5

(a) Structures of the identified actives from the Mcule-in-stock data set against STAT5 NTD, with their corresponding mean Kd values. (b) Binding mode of the most potent active compound from the Mcule-in-stock data set, S5N_M2, against the NTD of STAT5b.

Discussion

Docking Score Distribution and Runtime Analysis

Computational resources used and hit rates for each virtual screening run are summarized in Table (and Supporting Information Table S1 for the Benchmark data set). To compare the computational time requirement of each virtual screening run, the total CPU time in seconds was divided by the number of compounds within the data set (t one ) (Table ). We note here that for the Deep Docking runs, the time of the model training (which used GPUs) was included in the real time, but to compare the efficiency of the different workflows, the real time was multiplied with the number of used CPU cores to calculate the total CPU time and, from that, the t one value. As such, the t one value is a proxy that corresponds to an interpolated average time that would be required to process one compound if the whole data set was to be evaluated in the same way (vs. in reality, only a small fraction is docked in the Deep Docking workflow, while the rest are processed only by the deep learning model). For the virtual screenings in the ‘knowledge-based’ approach, the total CPU time (or likewise, the total elapsed time) was equal to the time required to complete the docking jobs, while in the AI-based approach, it is the time required to complete all 11 iteration steps (DL model training/refinement based on docking some compounds, and prediction of the rest of the data set) plus the time requirement to complete the post-DL docking jobs for the 20,000 (Mcule-in-stock) or 55,000 (Enamine REAL) predicted virtual hits. For virtual screenings done using Deep Docking, the CPU time was divided with the total number of compounds within the data set, instead of the number of actually docked compounds to reflect the much greater time efficiency.

3. Computational Resource Requirements and Hit Rates Summarized for Each Virtual Screening Run.

Target STAT3 SH2 domain STAT5b SH2 domain STAT5b NTD
Data set OtavaSH2 NP Mcule-in-stock Enamine REAL Mcule-in-stock Mcule-in-stock
Used approach Knowledge-based Knowledge-based AI-based AI-based AI-based AI-based
No. compounds in the data set 1,807 193,757 5,591,127 5,509,669,531 5,591,127 5,591,127
No. actually docked compounds 1,807 193,757 117,500 13,055,000 117,500 312,500
Used CPU Intel Xeon CPU E5–1660 v3 Intel Xeon CPU E5–1660 v3 AMD EPYC 7302P HPC (AMD EPYC 7763) AMD EPYC 7302P AMD EPYC 7302P
No. CPU cores used 2 4 12 64 12 12
Used GPU - - NVIDIA GeForce RTX 4070 HPC (NVIDIA A100) NVIDIA GeForce RTX 4070 NVIDIA GeForce RTX 4070
No. GPUs used 0 0 1 12 1 4
Model training real time (s) - - 855,989 686,357 751,727 280,664
Total real time (s) 14,313 807,653 1,830,125 1,997,656 1,254,372 732,463
CPU time (CPU s) 28,626 3,230,612 21,961,500 127,849,984 15,052,464 8,789,556
Calculated time requirement for one compound with one CPU core (t one ) (s) 15.841 16.674 3.9279 0.023205 2.6922 1.5721
No. compounds purchased and measured 27 17 16 8 14 14
No. hits 1 0 3 4 6 3
Hit rate (%) 3.7 0.0 18.8 50.0 42.9 21.4
a

For the OtavaSH2 and NP data sets, this number is equal to the number of compounds in the data set, while for the Mcule-in-stock and Enamine REAL data set, this number is equal to the total number of compounds docked in each Deep Docking iteration plus the number of compounds docked after the Deep Docking iterations (20,000 for Mcule-in-stock, 55,000 for Enamine REAL).

b

CPU time divided by the size of the data set.

Based on the t one values, AI-based virtual screenings were around one magnitude faster for the Mcule-in-stock data set and around three magnitudes faster for the Enamine REAL data set, as compared with the traditional VS approach. It is important to note here that Deep Docking consists of multiple steps (model training, model evaluation, virtual hit prediction) alongside docking, whose efficiency is mainly affected by the number of CPUs used. By contrast, correct parallelization, resource management, and optimization have a greater importance for efficient AI-based virtual screening, compared to a traditional docking calculation. The virtual screenings using the Mcule-in-stock data set were only 6.80 times slower against the STAT3 SH2 domain than using the NP data set, while covering 28.9 times more compounds, showcasing the efficiency of the Deep Docking workflow. By comparison, moving from the million-sized Mcule-in-stock set to the billion-sized Enamine set (3 orders of magnitude) conveys only a 6× increase in CPU time, highlighting the added value of the all-around use of GPUs, both for docking and model training. There are more peculiar data pairs in Table as well: the virtual screening run against STAT5b NTD took almost half as much CPU time to complete as the one against the STAT5b SH2 domain on the same infrastructure, while docking 2.66 times more compounds. Reasons may include the larger size of the docking grid and the use of an H-bond constraint for the SH2 domain as well as the dependence of the speed of docking upon molecular complexity, combined with the sampling characteristics of the Deep Docking workflow. That is, if a fairly complex molecule (whose docking is relatively slow) achieves a good docking score, Deep Docking labels it as a hit, and further iterations may include similarly large molecules, increasing the overall time required for the docking phases.

Comparing the hit rates (Table ), virtual screenings with the OtavaSH2 data set (3.7%) against the STAT3 SH2 domain and with the Mcule-in-stock data set against the STAT3 SH2 domain (18.8%) and STAT5b-NTD (21.4%) resulted in hit rates between 1% to 40%, which is the typical hit rate range in virtual screening. By comparison, screening the Benchmark data set against the STAT3 SH2 domain resulted in only a single weakly binding compound (S3_B12, with a mean IC50 value slightly higher than 100 μM) from the 15 measured compounds (technically a 0% hit rate), highlighting the added value of Deep Docking in hit identification. In the case of virtual screening with the Enamine REAL data set against the STAT3 SH2 domain and the Mcule-in-stock data set against the STAT5b SH2 domain, hit rates were 50.0% and 42.9%, respectively, which are exceptionally good values.

These results show that virtual screening with an AI-based approach can cover significantly more compounds within the same amount of time, while also being capable of producing better or, in some cases, exceptionally high hit rates. The results also show that an AI-based approach can work well even with evaluating a smaller data set.

Docking score distributions were analyzed in Figure . For the ‘knowledge-based’ approach, all compounds from the OtavaSH2 and NP data sets were evaluated, while for the AI-based approach, the exported top 20,000 compounds from the Mcule-in-stock data set and the exported top 55,000 compounds from the Enamine REAL data set by Deep Docking were evaluated. For additional information about the effectiveness of Deep Docking to enrich compounds with low (favorable) docking scores, a randomly selected set of 20,000 (Mcule-in-stock) or 55,000 compounds (Enamine REAL) was also evaluated. A lower docking score correlates with a predicted stronger affinity; however, larger molecules with more heavy atoms tend to have a bias toward low docking score values. To eliminate this bias, the docking score values were divided with the compounds’ heavy atom count (HA) to get DS/HA values. The distributions of the DS/HA values are also plotted in Figure .

6.

6

Docking score and DS/HA distributions of docked compounds in the box and whisker plots. In the case of the STAT3 SH2 domain, for the NP (NP column) and OtavaSH2 (Otava column) data sets, the distributions correspond to all compounds, while for the AI-based virtual screenings, they correspond to the docking scores and DS/HA values of the top 20,000 (DD_M column) or 55,000 (DD_E column) predicted virtual hits by Deep Docking from the Mcule-in-stock and Enamine REAL data sets, respectively. Random_M and Random_E columns correspond to the docking score and DS/HA distributions for 20,000 or 55,000 randomly selected compounds from the corresponding data set, respectively. In the case of the STAT5b SH2 domain (colored green) and STAT5b NTD (colored blue), the DD columns (DD_SH2 and DD_NTD) correspond to the docking score and DS/HA distributions for the top 20,000 compounds from the Mcule-in-stock data set for the appropriate target, while the Random columns (Random_SH2 and Random_NTD) correspond to the docking score and DS/HA distributions for the 20,000 randomly selected compounds from the Mcule-in-stock data set for the appropriate target.

For the virtual screens against the STAT3 SH2 domain, the docking score distributions were similar for the Enamine REAL, OtavaSH2, and NP data sets, while the Mcule-in-stock data set shows a clear shift toward lower docking score values. Comparing the results from the Deep Docking runs and the results from the randomly selected compounds, the run with the Mcule-in-stock data set shows a clear shift toward lower docking score values, meaning that Deep Docking successfully enriched the compounds with low docking score compounds. Interestingly, the Deep Docking run with the Enamine REAL data set only produced slightly improved results; however, in concordance with its main objective of identifying strong outliers, it produced numerous compounds with docking score values below −6, which the random selection did not. The insignificant shift of the overall distribution could be explained by the fact that only 0.24% of the whole data set was actually evaluated with docking, vs 2.1% in the case of the Mcule-in-stock set. Another possible explanation is the usage of AutoDockGPU as the docking algorithm instead of Glide SP, as its different pose prediction performance influenced the binding energy ranking and, ultimately, the model training.

Regarding the DS/HA values, just as for the docking score distributions, only the Deep Docking run with the Mcule-in-stock data set showed a clear shift toward more negative values, while also having numerous outliers in the more negative DS/HA value region. As the DS/HA value can be considered a computational proxy of size-independent ligand efficiency, its distribution also gives information about the similarity of compounds regarding the marginal benefit of adding more heavy atoms: a smaller range of data means a more homogeneous marginal benefit for growing the compounds. Here, all DS/HA distributions are fairly narrow, with the exception of the Mcule-in-stock data set sampled by the Deep Docking workflow. By contrast, there is little difference between the DD-prioritized portion of the Enamine REAL data set vs a random selection, which might be explained by the lower coverage of this data set by docking (see above) and/or the finite set of chemical transformations and building blocks composing the Enamine REAL space vs the less restricted, albeit much smaller, Mcule-in-stock set.

For the virtual screenings against the STAT5b SH2 domain and NTD, only the AI-based workflow was executed, but a comparison with a randomly drawn set of molecules reinforces the findings reported above. More precisely, the Deep Docking runs produced a larger amount of more negative outliers and a better all-around result than random selection.

These results, especially those on the Enamine REAL data set, show that even without a systematic shift toward a more favorable docking score distribution, the Deep Docking workflow can reliably identify strong outliers, translating into an increased chance of identifying active compounds, ultimately resulting in exceptional hit rates (42.9% for STAT5b SH2 domain and 50.0% for STAT3 SH2 domain). As for the ‘knowledge-based’ approaches, interestingly the virtual screen with the NP data set did not produce any actives, despite a comparatively high number of strong outliers (vs the OtavaSH2 set) and our hypothesis that the larger complexity of these compounds would be beneficial for targeting more challenging, shallow, and diffuse PPI sites.

In the case of the OtavaSH2 domain, which (after the NP set) produced the lowest hit rate (3.7%) from the largest set of compounds tested (27 for STAT3-SH2), there were not many strong outliers in either docking score or DS/HA values. (This ultimately corroborates the lower chances of hit identification expected from compounds having more positive docking score values.) The lack of hits from the NP data set also challenges the notion of selecting a screening deck based on simple heuristics (e.g., “more complex molecules will confer a higher probability for finding hits against difficult PPI binding sites”) and highlights the added value of screening a larger chemical space.

Conclusions

In this work, we have evaluated the performance of the AI-based Deep Docking workflow to identify novel inhibitors against difficult PPI targets, against traditional or ‘knowledge-based’ workflows. The SH2 domain of the STAT3 and STAT5b proteins and the N-terminal domain of STAT5b, a novel oncotarget, were chosen as examples. For the ‘knowledge-based’ approach, we have used two data sets, one specifically designed for SH2 domains (OtavaSH2 data set) and the other containing natural products or natural product-like compounds (NP data set). For the AI-based approach, we have used two large, but different-sized data sets: the Mcule-in-stock set of 5.59 M commercially available compounds and the Enamine REAL set of 5.51B synthesizable, on-demand virtual compounds. We have also used a Benchmark data set created by diverse selection from the Mcule-in-stock data set.

Compared to a hit rate of 3.7% from the OtavaSH2 library and a surprising lack of hits from the NP set, the Deep Docking workflow resulted in exceptional hit rates of 18.8%, 42.9%, and 21.4% from the Mcule-in-stock library (against STAT3-SH2, STAT5b-SH2, and STAT5b-NTD, respectively) and a 50.0% hit rate from the Enamine REAL library against the STAT3 SH2 domain.

In terms of runtime, while the AI-based approach requires more total computational time in accordance with the larger number of compounds within the data set, the projected CPU time of evaluating a single compound is multiple orders of magnitude smaller. In addition, the results show that the AI-based approach can produce better hit rates even with using a relatively small number of compounds for model training (i.e., fewer than in its originally intended use) than does the ‘knowledge-based’ approach after docking an altogether comparable number of compounds. Practically, if a docking of ∼ 120,000 is required in a ‘brute-force’ way (docking every single compound), it is highlighted here as a better alternative to run an AI-based virtual screening with a data set containing 5–10 million compounds instead, which is still feasible without supercomputing resources and is expected to produce better hit rates.

Docking score and DS/HA distributions were also analyzed, showcasing the superior ability of the AI-based approach to identify strong outliers with large negative docking scores (and thus increase the change of identifying hits), compared to both the ‘knowledge-based’ approach and a randomly selected set.

Lastly, we highlight that the reported virtual screens have successfully identified several new inhibitors of difficult PPI-type oncotargets. The activity ranges are in line with expectations of screening hits against this type of target, while the hit rates can be considered to be excellent or even exceptional for the AI-based workflow and in line with expectations for the traditional workflow (except for the NP set). While the STAT3 SH2 domain has a larger number of reported inhibitors, these compounds are a significant addition to the much smaller set of known STAT5b-SH2 inhibitors and, especially, to the reported inhibitors of the STAT5b N-terminal domain, where to the best of our knowledge, only two compounds were disclosed in our recent work. These results, together with the performance analysis of different virtual screening protocols, suggest utility of AI-based uHTVS workflows even on moderate infrastructure with a moderate-to-ultra large screening deck.

Supplementary Material

ci5c00907_si_001.xlsx (8.9MB, xlsx)
ci5c00907_si_002.pdf (845.7KB, pdf)

Acknowledgments

The authors thank Elvin de Araujo and Qirat Ashraf (University of Toronto) for providing the STAT proteins for the experimental work. This work was supported by the National Research Development and Innovation Office of Hungary [contracts FK146063 to D.B., K135150 and PharmaLab (RRF-2.3.1-21-2022-00015) to G.M.K.]. The work of D.B. was supported by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences. We acknowledge the HPC time and support of the Governmental Information Technology Development Agency, Hungary.

Training sets, docking and bioassay results and source data of the figures are published in the Supplementary Data file. Scripts used for the Deep Docking workflow are available at https://github.com/keserulab/uHTVS_toolkit.

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.5c00907.

  • Structures (in SMILES format) of the used compounds in the training set for retrospective virtual screening against STAT3 SH2 domain and STAT5b SH2 domain; structures (in SMILES format) of the measured compounds from the virtual screening runs with their code names, IC50 or Kd values, docking scores and DS/HA values; docking scores and DS/HA values for all of the docked compounds; log­(IC50) values of the newly identified compounds vs the documented inhibitors from ChEMBL against STAT3 (XLSX)

  • Supplementary figures with the ROC curves from the results of the retrospective virtual screening against STAT3 SH2 domain (Figure S1) and STAT5b SH2 domain (Figure S2); FP-assay results for all hit compounds for STAT3 SH2 domain and STAT5b SH2 domain (pages S3–S7); ITC binding curves for all hit compounds for STAT5b NTD (Figures S3–S7); Table containing data about the virtual screening with the Benchmark data set (Table S1) Deep Docking benchmarking statistics for each Deep Docking run (Figure S8) (PDF)

T.V.S. and L.S.-S. have developed and performed computational workflows. N.P. and L.P. have performed experimental measurements. D.B. and G.M.K. have conceptualized and directed the study. All authors have participated in writing the manuscript.

The authors declare no competing financial interest.

References

  1. van Hilten N., Chevillard F., Kolb P.. Virtual Compound Libraries in Computer-Assisted Drug Discovery. J. Chem. Inf. Model. 2019;59(2):644–651. doi: 10.1021/acs.jcim.8b00737. [DOI] [PubMed] [Google Scholar]
  2. Irwin J. J., Tang K. G., Young J., Dandarchuluun C., Wong B. R., Khurelbaatar M., Moroz Y. S., Mayfield J., Sayle R. A.. ZINC20A Free Ultralarge-Scale Chemical Database for Ligand Discovery. J. Chem. Inf. Model. 2020;60(12):6065–6073. doi: 10.1021/acs.jcim.0c00675. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bajusz D., Keserű G. M.. Maximizing the Integration of Virtual and Experimental Screening in Hit Discovery. Expert Opin. Drug Discovery. 2022;17(6):629–640. doi: 10.1080/17460441.2022.2085685. [DOI] [PubMed] [Google Scholar]
  4. Gentile F., Agrawal V., Hsing M., Ton A.-T., Ban F., Norinder U., Gleave M. E., Cherkasov A.. Deep Docking: A Deep Learning Platform for Augmentation of Structure Based Drug Discovery. ACS Cent. Sci. 2020;6(6):939–949. doi: 10.1021/acscentsci.0c00229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Kalliokoski T.. Machine Learning Boosted Docking (HASTEN): An Open-Source Tool To Accelerate Structure-Based Virtual Screening Campaigns. Mol. Inform. 2021;40(9):2100089. doi: 10.1002/minf.202100089. [DOI] [PubMed] [Google Scholar]
  6. Popov K. I., Wellnitz J., Maxfield T., Tropsha A.. HIt Discovery Using Docking ENriched by GEnerative Modeling (HIDDEN GEM): A Novel Computational Workflow for Accelerated Virtual Screening of Ultra-Large Chemical Libraries. Mol. Inform. 2024;43(1):e202300207. doi: 10.1002/minf.202300207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Zhang X., Zhang O., Shen C., Qu W., Chen S., Cao H., Kang Y., Wang Z., Wang E., Zhang J., Deng Y., Liu F., Wang T., Du H., Wang L., Pan P., Chen G., Hsieh C.-Y., Hou T.. Efficient and Accurate Large Library Ligand Docking with KarmaDock. Nat. Comput. Sci. 2023;3(9):789–804. doi: 10.1038/s43588-023-00511-5. [DOI] [PubMed] [Google Scholar]
  8. Sadybekov A. A., Sadybekov A. V., Liu Y., Iliopoulos-Tsoutsouvas C., Huang X.-P., Pickett J., Houser B., Patel N., Tran N. K., Tong F., Zvonok N., Jain M. K., Savych O., Radchenko D. S., Nikas S. P., Petasis N. A., Moroz Y. S., Roth B. L., Makriyannis A., Katritch V.. Synthon-Based Ligand Discovery in Virtual Libraries of over 11 Billion Compounds. Nature. 2022;601(7893):452–459. doi: 10.1038/s41586-021-04220-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Chandraghatgi R., Ji H.-F., Rosen G. L., Sokhansanj B. A.. Streamlining Computational Fragment-Based Drug Discovery through Evolutionary Optimization Informed by Ligand-Based Virtual Prescreening. J. Chem. Inf. Model. 2024;64(9):3826–3840. doi: 10.1021/acs.jcim.4c00234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Cheng C., Beroza P.. Shape-Aware Synthon Search (SASS) for Virtual Screening of Synthon-Based Chemical Spaces. J. Chem. Inf. Model. 2024;64(4):1251–1260. doi: 10.1021/acs.jcim.3c01865. [DOI] [PubMed] [Google Scholar]
  11. REAL database - Enamine. https://enamine.net/compound-collections/real-compounds/real-database (accessed 2024-11-11).
  12. mcule database. https://mcule.com/database/ (accessed 2024-11-11).
  13. eXplore: the world’s largest commercially available chemical space. eMolecules. https://www.emolecules.com/explore (accessed 2024-11-11).
  14. Acharya A., Agarwal R., Baker M. B., Baudry J., Bhowmik D., Boehm S., Byler K. G., Chen S. Y., Coates L., Cooper C. J., Demerdash O., Daidone I., Eblen J. D., Ellingson S., Forli S., Glaser J., Gumbart J. C., Gunnels J., Hernandez O., Irle S., Kneller D. W., Kovalevsky A., Larkin J., Lawrence T. J., LeGrand S., Liu S.-H., Mitchell J. C., Park G., Parks J. M., Pavlova A., Petridis L., Poole D., Pouchard L., Ramanathan A., Rogers D. M., Santos-Martins D., Scheinberg A., Sedova A., Shen Y., Smith J. C., Smith M. D., Soto C., Tsaris A., Thavappiragasam M., Tillack A. F., Vermaas J. V., Vuong V. Q., Yin J., Yoo S., Zahran M., Zanetti-Polzi L.. Supercomputer-Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19. J. Chem. Inf. Model. 2020;60(12):5832–5852. doi: 10.1021/acs.jcim.0c01010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Vermaas J. V., Sedova A., Baker M. B., Boehm S., Rogers D. M., Larkin J., Glaser J., Smith M. D., Hernandez O., Smith J. C.. Supercomputing Pipelines Search for Therapeutics Against COVID-19. Comput. Sci. Eng. 2021;23(1):7–16. doi: 10.1109/MCSE.2020.3036540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Gadioli D., Vitali E., Ficarelli F., Latini C., Manelfi C., Talarico C., Silvano C., Cavazzoni C., Palermo G., Beccari A. R.. EXSCALATE: An Extreme-Scale Virtual Screening Platform for Drug Discovery Targeting Polypharmacology to Fight SARS-CoV-2. IEEE Trans. Emerg. Top. Comput. 2023;11(1):170–181. doi: 10.1109/TETC.2022.3187134. [DOI] [Google Scholar]
  17. Orlova A., Wagner C., de Araujo E. D., Bajusz D., Neubauer H. A., Herling M., Gunning P. T., Keserű G. M., Moriggl R.. Direct Targeting Options for STAT3 and STAT5 in Cancer. Cancers. 2019;11(12):1930. doi: 10.3390/cancers11121930. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Bromberg J.. Stat Proteins and Oncogenesis. J. Clin. Invest. 2002;109(9):1139–1142. doi: 10.1172/JCI15617. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Levy D. E.. Physiological Significance of STAT Proteins: Investigations through Gene Disruption in Vivo. Cell. Mol. Life Sci. CMLS. 1999;55(12):1559–1567. doi: 10.1007/s000180050395. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Swihart K., Fruth U., Messmer N., Hug K., Behin R., Huang S., Del Giudice G., Aguet M., Louis J. A.. Mice from a Genetically Resistant Background Lacking the Interferon Gamma Receptor Are Susceptible to Infection with Leishmania Major but Mount a Polarized T Helper Cell 1-Type CD4+ T Cell Response. J. Exp. Med. 1995;181(3):961–971. doi: 10.1084/jem.181.3.961. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Wang Z. E., Reiner S. L., Zheng S., Dalton D. K., Locksley R. M.. CD4+ Effector Cells Default to the Th2 Pathway in Interferon Gamma-Deficient Mice Infected with Leishmania Major. J. Exp. Med. 1994;179(4):1367–1371. doi: 10.1084/jem.179.4.1367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Buchmeier N. A., Schreiber R. D.. Requirement of Endogenous Interferon-Gamma Production for Resolution of Listeria Monocytogenes Infection. Proc. Natl. Acad. Sci. U. S. A. 1985;82(21):7404–7408. doi: 10.1073/pnas.82.21.7404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Takeda K., Noguchi K., Shi W., Tanaka T., Matsumoto M., Yoshida N., Kishimoto T., Akira S.. Targeted Disruption of the Mouse Stat3 Gene Leads to Early Embryonic Lethality. Proc. Natl. Acad. Sci. U. S. A. 1997;94(8):3801–3804. doi: 10.1073/pnas.94.8.3801. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Levy D. E., Lee C.. What Does Stat3 Do? J. Clin. Invest. 2002;109(9):1143–1148. doi: 10.1172/JCI15650. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Sano S., Itami S., Takeda K., Tarutani M., Yamaguchi Y., Miura H., Yoshikawa K., Akira S., Takeda J.. Keratinocyte-specific Ablation of Stat3 Exhibits Impaired Skin Remodeling, but Does Not Affect Skin Morphogenesis. EMBO J. 1999;18(17):4657–4668. doi: 10.1093/emboj/18.17.4657. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Heinrich P. C., Behrmann I., Müller-Newen G., Schaper F., Graeve L.. Interleukin-6-Type Cytokine Signalling through the Gp130/Jak/STAT Pathway1. Biochem. J. 1998;334(2):297–314. doi: 10.1042/bj3340297. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Riley J. K., Takeda K., Akira S., Schreiber R. D.. Interleukin-10 Receptor Signaling through the JAK-STAT Pathway: Requirement for Two Distinct Receptor-Derived Signals for Anti-Inflammatory Action. J. Biol. Chem. 1999;274(23):16513–16521. doi: 10.1074/jbc.274.23.16513. [DOI] [PubMed] [Google Scholar]
  28. Chapman R. S., Lourenco P. C., Tonner E., Flint D. J., Selbert S., Takeda K., Akira S., Clarke A. R., Watson C. J.. Suppression of Epithelial Apoptosis and Delayed Mammary Gland Involution in Mice with a Conditional Knockout of Stat3. Genes Dev. 1999;13(19):2604–2616. doi: 10.1101/gad.13.19.2604. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Liu X., Robinson G. W., Wagner K. U., Garrett L., Wynshaw-Boris A., Hennighausen L.. Stat5a Is Mandatory for Adult Mammary Gland Development and Lactogenesis. Genes Dev. 1997;11(2):179–186. doi: 10.1101/gad.11.2.179. [DOI] [PubMed] [Google Scholar]
  30. Nakajima H., Liu X.-W., Wynshaw-Boris A., Rosenthal L. A., Imada K., Finbloom D. S., Hennighausen L., Leonard W. J.. An Indirect Effect of Stat5a in IL-2-Induced Proliferation: A Critical Role for Stat5a in IL-2-Mediated IL-2 Receptor α Chain Induction. Immunity. 1997;7(5):691–701. doi: 10.1016/S1074-7613(00)80389-1. [DOI] [PubMed] [Google Scholar]
  31. Waxman D. J., Ram P. A., Pampori N. A., Shapiro B. H.. Growth Hormone Regulation of Male-Specific Rat Liver P450s 2A2 and 3A2: Induction by Intermittent Growth Hormone Pulses in Male but Not Female Rats Rendered Growth Hormone Deficient by Neonatal Monosodium Glutamate. Mol. Pharmacol. 1995;48(5):790–797. doi: 10.1016/S0026-895X(25)10535-X. [DOI] [PubMed] [Google Scholar]
  32. Liu B. A., Jablonowski K., Raina M., Arce M., Pawson T., Nash P. D.. The Human and Mouse Complement of SH2 Domain ProteinsEstablishing the Boundaries of Phosphotyrosine Signaling. Mol. Cell. 2006;22(6):851–868. doi: 10.1016/j.molcel.2006.06.001. [DOI] [PubMed] [Google Scholar]
  33. Guanizo A. C., Fernando C. D., Garama D. J., Gough D. J.. STAT3: A Multifaceted Oncoprotein. Growth Factors. 2018;36(1–2):1–14. doi: 10.1080/08977194.2018.1473393. [DOI] [PubMed] [Google Scholar]
  34. Loh C.-Y., Arya A., Naema A. F., Wong W. F., Sethi G., Looi C. Y.. Signal Transducer and Activator of Transcription (STATs) Proteins in Cancer and Inflammation: Functions and Therapeutic Implication. Front. Oncol. 2019;9:48. doi: 10.3389/fonc.2019.00048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Kraskouskaya D., Duodu E., Arpin C. C., Gunning P. T.. Progress towards the Development of SH2 Domain Inhibitors. Chem. Soc. Rev. 2013;42(8):3337–3370. doi: 10.1039/c3cs35449k. [DOI] [PubMed] [Google Scholar]
  36. Ma J., Qin L., Li X.. Role of STAT3 Signaling Pathway in Breast Cancer. Cell Commun. Signal. 2020;18(1):33. doi: 10.1186/s12964-020-0527-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Taniguchi K., Tsugane M., Asai A.. A Brief Update on STAT3 Signaling: Current Challenges and Future Directions in Cancer Treatment. J. Cell. Signal. 2021;2(3):181–194. doi: 10.33696/Signaling.2.050. [DOI] [Google Scholar]
  38. Maurer B., Kollmann S., Pickem J., Hoelbl-Kovacic A., Sexl V.. STAT5A and STAT5BTwins with Different Personalities in Hematopoiesis and Leukemia. Cancers. 2019;11(11):1726. doi: 10.3390/cancers11111726. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Bromberg J. F., Wrzeszczynska M. H., Devgan G., Zhao Y., Pestell R. G., Albanese C., Darnell J. E.. Stat3 as an Oncogene. Cell. 1999;98(3):295–303. doi: 10.1016/S0092-8674(00)81959-5. [DOI] [PubMed] [Google Scholar]
  40. Kaplan D. H., Shankaran V., Dighe A. S., Stockert E., Aguet M., Old L. J., Schreiber R. D.. Demonstration of an Interferon γ-Dependent Tumor Surveillance System in Immunocompetent Mice. Proc. Natl. Acad. Sci. U. S. A. 1998;95(13):7556–7561. doi: 10.1073/pnas.95.13.7556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Bromberg J. F., Horvath C. M., Wen Z., Schreiber R. D., Darnell J. E.. Transcriptionally Active Stat1 Is Required for the Antiproliferative Effects of Both Interferon Alpha and Interferon Gamma. Proc. Natl. Acad. Sci. U. S. A. 1996;93(15):7673–7678. doi: 10.1073/pnas.93.15.7673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Kumar A., Commane M., Flickinger T. W., Horvath C. M., Stark G. R.. Defective TNF-α-Induced Apoptosis in STAT1-Null Cells Due to Low Constitutive Levels of Caspases. Science. 1997;278(5343):1630–1632. doi: 10.1126/science.278.5343.1630. [DOI] [PubMed] [Google Scholar]
  43. Lee C.-K., Smith E., Gimeno R., Gertner R., Levy D. E.. STAT1 Affects Lymphocyte Survival and Proliferation Partially Independent of Its Role Downstream of IFN-Γ1. J. Immunol. 2000;164(3):1286–1292. doi: 10.4049/jimmunol.164.3.1286. [DOI] [PubMed] [Google Scholar]
  44. Chin Y. E., Kitagawa M., Su W.-C. S., You Z.-H., Iwamoto Y., Fu X.-Y.. Cell Growth Arrest and Induction of Cyclin-Dependent Kinase Inhibitor p21WAF1/CIP1Mediated by STAT1. Science. 1996;272(5262):719–722. doi: 10.1126/science.272.5262.719. [DOI] [PubMed] [Google Scholar]
  45. Bromberg J. F., Fan Z., Brown C., Mendelsohn J., Darnell J. E.. Epidermal Growth Factor-Induced Growth Inhibition Requires Stat1 Activation. Cell Growth Differ. 1998;9(7):505–512. [PubMed] [Google Scholar]
  46. Halim C. E., Deng S., Ong M. S., Yap C. T.. Involvement of STAT5 in Oncogenesis. Biomedicines. 2020;8(9):316. doi: 10.3390/biomedicines8090316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Akira S.. Roles of STAT3 Defined by Tissue-Specific Gene Targeting. Oncogene. 2000;19(21):2607–2611. doi: 10.1038/sj.onc.1203478. [DOI] [PubMed] [Google Scholar]
  48. Shakespeare W. C.. SH2 Domain Inhibition: A Problem Solved? Curr. Opin. Chem. Biol. 2001;5(4):409–415. doi: 10.1016/S1367-5931(00)00222-2. [DOI] [PubMed] [Google Scholar]
  49. Polgar T., Baki A., Szendrei G. I., Keserűu G. M.. Comparative Virtual and Experimental High-Throughput Screening for Glycogen Synthase Kinase-3β Inhibitors. J. Med. Chem. 2005;48(25):7946–7959. doi: 10.1021/jm050504d. [DOI] [PubMed] [Google Scholar]
  50. de Graaf C., Kooistra A. J., Vischer H. F., Katritch V., Kuijer M., Shiroishi M., Iwata S., Shimamura T., Stevens R. C., de Esch I. J. P., Leurs R.. Crystal Structure-Based Virtual Screening for Fragment-like Ligands of the Human Histamine H1 Receptor. J. Med. Chem. 2011;54(23):8195–8206. doi: 10.1021/jm2011589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. SH2 Domain Targeted Library. https://www.otavachemicals.com/targets/sh2-domain-targeted-library (accessed 2024-11-11).
  52. Jin X., Lee K., Kim N. H., Kim H. S., Yook J. I., Choi J., No K. T.. Natural Products Used as a Chemical Library for Protein-Protein Interaction Targeted Drug Discovery. J. Mol. Graph. Model. 2018;79:46–58. doi: 10.1016/j.jmgm.2017.10.015. [DOI] [PubMed] [Google Scholar]
  53. Atanasov A. G., Zotchev S. B., Dirsch V. M., Supuran C. T.. et al. Natural Products in Drug Discovery: Advances and Opportunities. Nat. Rev. Drug Discovery. 2021;20(3):200–216. doi: 10.1038/s41573-020-00114-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Natural Product-like Compound Libraries for screening. Life Chemicals. https://lifechemicals.com/screening-libraries/natural-product-like-compound-library (accessed 2024-11-11).
  55. Natural Product-like. ChemBridge. https://chembridge.com/targeted-and-specialty-libraries/natural-product-like/ (accessed 2024-11-11).
  56. Screening Libraries (All Libraries). Asinex.com. https://www.asinex.com/screening-libraries-(all-libraries) (accessed 2024-11-11).
  57. Natural Compounds Library. ChemDiv. https://www.chemdiv.com/catalog/focused-and-targeted-libraries/natural-compounds-library/ (accessed 2024-11-11).
  58. Baell J. B., Holloway G. A.. New Substructure Filters for Removal of Pan Assay Interference Compounds (PAINS) from Screening Libraries and for Their Exclusion in Bioassays. J. Med. Chem. 2010;53(7):2719–2740. doi: 10.1021/jm901137j. [DOI] [PubMed] [Google Scholar]
  59. Lipinski C. A., Lombardo F., Dominy B. W., Feeney P. J.. Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings. Adv. Drug Delivery Rev. 1997;23(1):3–25. doi: 10.1016/S0169-409X(96)00423-1. [DOI] [PubMed] [Google Scholar]
  60. Veber D. F., Johnson S. R., Cheng H.-Y., Smith B. R., Ward K. W., Kopple K. D.. Molecular Properties That Influence the Oral Bioavailability of Drug Candidates. J. Med. Chem. 2002;45(12):2615–2623. doi: 10.1021/jm020017n. [DOI] [PubMed] [Google Scholar]
  61. Landrum, G. RDKit: Open-Source Cheminformatics Software. GitHub; 2016.https://github.com/rdkit/rdkit/releases/tag/Release_2016_09_4
  62. Ashton M., Barnard J., Casset F., Charlton M., Downs G., Gorse D., Holliday J., Lahana R., Willett P.. Identification of Diverse Database Subsets Using Property-Based and Fragment-Based Molecular Descriptions. Quant. Struct.-Act. Relatsh. 2002;21(6):598–604. doi: 10.1002/qsar.200290002. [DOI] [Google Scholar]
  63. Berthold, M. R. ; Cebron, N. ; Dill, F. ; Gabriel, T. R. ; Kötter, T. ; Meinl, T. ; Ohl, P. ; Sieb, C. ; Thiel, K. ; Wiswedel, B. . KNIME: The Konstanz Information Miner. In Studies in Classification, Data Analysis, and Knowledge Organization (GfKL 2007); Springer, 2007. [Google Scholar]
  64. Zdrazil B., Felix E., Hunter F., Manners E. J., Blackshaw J., Corbett S., de Veij M., Ioannidis H., Lopez D. M., Mosquera J. F., Magarinos M. P., Bosc N., Arcila R., Kizilören T., Gaulton A., Bento A. P., Adasme M. F., Monecke P., Landrum G. A., Leach A. R.. The ChEMBL Database in 2023: A Drug Discovery Platform Spanning Multiple Bioactivity Data Types and Time Periods. Nucleic Acids Res. 2024;52(D1):D1180–D1192. doi: 10.1093/nar/gkad1004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Mysinger M. M., Carchia M., Irwin J. J., Shoichet B. K.. Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking. J. Med. Chem. 2012;55(14):6582–6594. doi: 10.1021/jm300687e. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Madhavi Sastry G., Adzhigirey M., Day T., Annabhimoju R., Sherman W.. Protein and Ligand Preparation: Parameters, Protocols, and Influence on Virtual Screening Enrichments. J. Comput. Aided Mol. Des. 2013;27(3):221–234. doi: 10.1007/s10822-013-9644-8. [DOI] [PubMed] [Google Scholar]
  67. Friesner R. A., Banks J. L., Murphy R. B., Halgren T. A., Klicic J. J., Mainz D. T., Repasky M. P., Knoll E. H., Shelley M., Perry J. K., Shaw D. E., Francis P., Shenkin P. S.. Glide: A New Approach for Rapid, Accurate Docking and Scoring. 1. Method and Assessment of Docking Accuracy. J. Med. Chem. 2004;47(7):1739–1749. doi: 10.1021/jm0306430. [DOI] [PubMed] [Google Scholar]
  68. Halgren T. A., Murphy R. B., Friesner R. A., Beard H. S., Frye L. L., Pollard W. T., Banks J. L.. Glide: A New Approach for Rapid, Accurate Docking and Scoring. 2. Enrichment Factors in Database Screening. J. Med. Chem. 2004;47(7):1750–1759. doi: 10.1021/jm030644s. [DOI] [PubMed] [Google Scholar]
  69. Belo Y., Mielko Z., Nudelman H., Afek A., Ben-David O., Shahar A., Zarivach R., Gordan R., Arbely E.. Unexpected Implications of STAT3 Acetylation Revealed by Genetic Encoding of Acetyl-Lysine. Biochim. Biophys. Acta BBA - Gen. Subj. 2019;1863(9):1343–1350. doi: 10.1016/j.bbagen.2019.05.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Wang Y., Ren X., Deng C., Yang L., Yan E., Guo T., Li Y., Xu M. X.. Mechanism of the Inhibition of the STAT3 Signaling Pathway by EGCG. Oncol. Rep. 2013;30(6):2691–2696. doi: 10.3892/or.2013.2743. [DOI] [PubMed] [Google Scholar]
  71. Ropp P. J., Kaminsky J. C., Yablonski S., Durrant J. D.. Dimorphite-DL: An Open-Source Program for Enumerating the Ionization States of Drug-like Small Molecules. J. Cheminformatics. 2019;11(1):14. doi: 10.1186/s13321-019-0336-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. RDKit. https://www.rdkit.org/ (accessed 2024-11-11).
  73. Morris G. M., Huey R., Lindstrom W., Sanner M. F., Belew R. K., Goodsell D. S., Olson A. J.. AutoDock4 and AutoDockTools4: Automated Docking with Selective Receptor Flexibility. J. Comput. Chem. 2009;30(16):2785–2791. doi: 10.1002/jcc.21256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Santos-Martins D., Solis-Vasquez L., Tillack A. F., Sanner M. F., Koch A., Forli S.. Accelerating AutoDock4 with GPUs and Gradient-Based Local Search. J. Chem. Theory Comput. 2021;17(2):1060–1073. doi: 10.1021/acs.jctc.0c01006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Mihalovits L. M., Szalai T. V., Bajusz D., Keserű G. M.. Exploring Chemical Spaces in the Billion Range: Is Docking a Computational Alternative to DNA-Encoded Libraries? J. Chem. Inf. Model. 2024;64(23):8963–8979. doi: 10.1021/acs.jcim.4c00803. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. de Araujo E. D., Erdogan F., Neubauer H. A., Meneksedag-Erol D., Manaswiyoungkul P., Eram M. S., Seo H.-S., Qadree A. K., Israelian J., Orlova A., Suske T., Pham H. T. T., Boersma A., Tangermann S., Kenner L., Rülicke T., Dong A., Ravichandran M., Brown P. J., Audette G. F., Rauscher S., Dhe-Paganon S., Moriggl R., Gunning P. T.. Structural and Functional Consequences of the STAT5BN642H Driver Mutation. Nat. Commun. 2019;10(1):2517. doi: 10.1038/s41467-019-10422-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Cumaraswamy A. A., Lewis A. M., Geletu M., Todic A., Diaz D. B., Cheng X. R., Brown C. E., Laister R. C., Muench D., Kerman K., Grimes H. L., Minden M. D., Gunning P. T.. Nanomolar-Potency Small Molecule Inhibitor of STAT5 Protein. ACS Med. Chem. Lett. 2014;5(11):1202–1206. doi: 10.1021/ml500165r. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Page B. D. G., Khoury H., Laister R. C., Fletcher S., Vellozo M., Manzoli A., Yue P., Turkson J., Minden M. D., Gunning P. T.. Small Molecule STAT5-SH2 Domain Inhibitors Exhibit Potent Antileukemia Activity. J. Med. Chem. 2012;55(3):1047–1055. doi: 10.1021/jm200720n. [DOI] [PubMed] [Google Scholar]
  79. Elumalai N., Berg A., Natarajan K., Scharow A., Berg T.. Nanomolar Inhibitors of the Transcription Factor STAT5b with High Selectivity over STAT5a. Angew. Chem., Int. Ed. 2015;54(16):4758–4763. doi: 10.1002/anie.201410672. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Elumalai N., Berg A., Rubner S., Blechschmidt L., Song C., Natarajan K., Matysik J., Berg T.. Rational Development of Stafib-2: A Selective, Nanomolar Inhibitor of the Transcription Factor STAT5b. Sci. Rep. 2017;7(1):819. doi: 10.1038/s41598-017-00920-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Müller J., Sperl B., Reindl W., Kiessling A., Berg T.. Discovery of Chromone-Based Inhibitors of the Transcription Factor STAT5. ChemBioChem. 2008;9(5):723–727. doi: 10.1002/cbic.200700701. [DOI] [PubMed] [Google Scholar]
  82. Gilliland D. G., Griffin J. D.. The Roles of FLT3 in Hematopoiesis and Leukemia. Blood. 2002;100(5):1532–1542. doi: 10.1182/blood-2002-02-0492. [DOI] [PubMed] [Google Scholar]
  83. Jacobson M. P., Friesner R. A., Xiang Z., Honig B.. On the Role of the Crystal Environment in Determining Protein Side-Chain Conformations. J. Mol. Biol. 2002;320(3):597–608. doi: 10.1016/S0022-2836(02)00470-9. [DOI] [PubMed] [Google Scholar]
  84. Jacobson M. P., Pincus D. L., Rapp C. S., Day T. J. F., Honig B., Shaw D. E., Friesner R. A.. A Hierarchical Approach to All-Atom Protein Loop Prediction. Proteins Struct. Funct. Bioinforma. 2004;55(2):351–367. doi: 10.1002/prot.10613. [DOI] [PubMed] [Google Scholar]
  85. Hu T., Yeh J. E., Pinello L., Jacob J., Chakravarthy S., Yuan G.-C., Chopra R., Frank D. A.. Impact of the N-Terminal Domain of STAT3 in STAT3-Dependent Transcriptional Activity. Mol. Cell. Biol. 2015;35(19):3284–3300. doi: 10.1128/MCB.00060-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Weber E., Abranyi-Balogh P., Nagymihaly B., Menyhard D. K., Peczka N., Gadanecz M., Schlosser G., Orgovan Z., Bogar F., Bajusz D., Kecskemeti G., Szabo Z., Bartus E., Tokoli A., Toth G. K., Szalai T. V., Takacs T., de Araujo E., Buday L., Perczel A., Martinek T. A., Keseru G. M.. Target-Templated Construction of Functional Proteomimetics Using Photo-Foldamer Libraries. Angew. Chem., Int. Ed. 2025;64(2):e202410435. doi: 10.1002/anie.202410435. [DOI] [PubMed] [Google Scholar]
  87. de Araujo E. D., Manaswiyoungkul P., Israelian J., Park J., Yuen K., Farhangi S., Berger-Becvar A., Abu-Jazar L., Gunning P. T.. High-Throughput Thermofluor-Based Assays for Inhibitor Screening of STAT SH2 Domains. J. Pharm. Biomed. Anal. 2017;143:159–167. doi: 10.1016/j.jpba.2017.04.052. [DOI] [PubMed] [Google Scholar]
  88. Abranyi-Balogh P., Bajusz D., Orgovan Z., Keeley A. B., Petri L., Peczka N., Szalai T. V., Palfy G., Gadanecz M., Grant E. K., Imre T., Takacs T., Ranđelovic I., Baranyi M., Marton A., Schlosser G., Ashraf Q. F., de Araujo E. D., Karancsi T., Buday L., Tovari J., Perczel A., Bush J. T., Keseru G. M.. Mapping Protein Binding Sites by Photoreactive Fragment Pharmacophores. Commun. Chem. 2024;7(1):1–13. doi: 10.1038/s42004-024-01252-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Dammann M., Kramer M., Zimmermann M. O., Boeckler F. M.. Quadruple Target Evaluation of Diversity-Optimized Halogen-Enriched Fragments (HEFLibs) Reveals Substantial Ligand Efficiency for AP2-Associated Protein Kinase 1 (AAK1) Front. Chem. 2022;9:815567. doi: 10.3389/fchem.2021.815567. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Nagini, S. Neem Limonoids as Anticancer Agents: Modulation of Cancer Hallmarks and Oncogenic Signaling. In The Enzymes; Bathaie, S. Z. , Tamanoi, F. , Eds.; Natural Products and Cancer Signaling: Isoprenoids, Polyphenols and Flavonoids; Academic Press, 2014; Vol. 36, pp 131–147. 10.1016/B978-0-12-802215-3.00007-0. [DOI] [PubMed] [Google Scholar]
  91. Liu, B. ; Badali, D. ; Fletcher, S. ; Avadisian, M. ; Gunning, P. ; Gradinaru, C. . Single-Molecule Fluorescence Study of the Inhibition of the Oncogenic Functionality of STAT3. In Photonics North 2009; SPIE, 2009; Vol. 7386, pp 39–45. 10.1117/12.838943. [DOI] [Google Scholar]
  92. Lis C., Rubner S., Roatsch M., Berg A., Gilcrest T., Fu D., Nguyen E., Schmidt A.-M., Krautscheid H., Meiler J., Berg T.. Development of Erasin: A Chromone-Based STAT3 Inhibitor Which Induces Apoptosis in Erlotinib-Resistant Lung Cancer Cells. Sci. Rep. 2017;7(1):17390. doi: 10.1038/s41598-017-17600-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Able A. A., Burrell J. A., Stephens J. M.. STAT5-Interacting Proteins: A Synopsis of Proteins That Regulate STAT5 Activity. Biology. 2017;6(1):20. doi: 10.3390/biology6010020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. Vogt M., Domoszlai T., Kleshchanok D., Lehmann S., Schmitt A., Poli V., Richtering W., Müller-Newen G.. The Role of the N-Terminal Domain in Dimerization and Nucleocytoplasmic Shuttling of Latent STAT3. J. Cell Sci. 2011;124(6):900–909. doi: 10.1242/jcs.072520. [DOI] [PubMed] [Google Scholar]
  95. Haftchenary S., Luchman H. A., Jouk A. O., Veloso A. J., Page B. D. G., Cheng X. R., Dawson S. S., Grinshtein N., Shahani V. M., Kerman K., Kaplan D. R., Griffin C., Aman A. M., Al-awar R., Weiss S., Gunning P. T.. Potent Targeting of the STAT3 Protein in Brain Cancer Stem Cells: A Promising Route for Treating Glioblastoma. ACS Med. Chem. Lett. 2013;4(11):1102–1107. doi: 10.1021/ml4003138. [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Mandal P. K., Liao W. S.-L., McMurray J. S.. Synthesis of Phosphatase-Stable, Cell-Permeable Peptidomimetic Prodrugs That Target the SH2 Domain of Stat3. Org. Lett. 2009;11(15):3394–3397. doi: 10.1021/ol9012662. [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Mandal P. K., Gao F., Lu Z., Ren Z., Ramesh R., Birtwistle J. S., Kaluarachchi K. K., Chen X., Bast R. C. Jr, Liao W. S., McMurray J. S.. Potent and Selective Phosphopeptide Mimetic Prodrugs Targeted to the Src Homology 2 (SH2) Domain of Signal Transducer and Activator of Transcription 3. J. Med. Chem. 2011;54(10):3549–3563. doi: 10.1021/jm2000882. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Zhang X., Yue P., Page B. D. G., Li T., Zhao W., Namanja A. T., Paladino D., Zhao J., Chen Y., Gunning P. T., Turkson J.. Orally Bioavailable Small-Molecule Inhibitor of Transcription Factor Stat3 Regresses Human Breast and Lung Cancer Xenografts. Proc. Natl. Acad. Sci. U. S. A. 2012;109(24):9623–9628. doi: 10.1073/pnas.1121606109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Schust J., Sperl B., Hollis A., Mayer T. U., Berg T.. Stattic: A Small-Molecule Inhibitor of STAT3 Activation and Dimerization. Chem. Biol. 2006;13(11):1235–1242. doi: 10.1016/j.chembiol.2006.09.018. [DOI] [PubMed] [Google Scholar]
  100. Elumalai N., Berg A., Rubner S., Berg T.. Phosphorylation of Capsaicinoid Derivatives Provides Highly Potent and Selective Inhibitors of the Transcription Factor STAT5b. ACS Chem. Biol. 2015;10(12):2884–2890. doi: 10.1021/acschembio.5b00817. [DOI] [PubMed] [Google Scholar]
  101. Fletcher S., Turkson J., Gunning P. T.. Molecular Approaches towards the Inhibition of the Signal Transducer and Activator of Transcription 3 (Stat3) Protein. ChemMedChem. 2008;3(8):1159–1168. doi: 10.1002/cmdc.200800123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  102. Irwin J. J., Duan D., Torosyan H., Doak A. K., Ziebart K. T., Sterling T., Tumanian G., Shoichet B. K.. An Aggregation Advisor for Ligand Discovery. J. Med. Chem. 2015;58(17):7076–7087. doi: 10.1021/acs.jmedchem.5b01105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Elumalai N., Natarajan K., Berg T.. Halogen-Substituted Catechol Bisphosphates Are Potent and Selective Inhibitors of the Transcription Factor STAT5b. Bioorg. Med. Chem. 2017;25(14):3871–3882. doi: 10.1016/j.bmc.2017.05.039. [DOI] [PubMed] [Google Scholar]
  104. Bosc D., Camberlein V., Gealageas R., Castillo-Aguilera O., Deprez B., Deprez-Poulain R.. Kinetic Target-Guided Synthesis: Reaching the Age of Maturity. J. Med. Chem. 2020;63(8):3817–3833. doi: 10.1021/acs.jmedchem.9b01183. [DOI] [PubMed] [Google Scholar]
  105. Zhu T., Cao S., Su P.-C., Patel R., Shah D., Chokshi H. B., Szukala R., Johnson M. E., Hevener K. E.. Hit Identification and Optimization in Virtual Screening: Practical Recommendations Based on a Critical Literature Analysis. J. Med. Chem. 2013;56(17):6560–6572. doi: 10.1021/jm301916b. [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Carta G., Knox A. J. S., Lloyd D. G.. Unbiasing Scoring Functions: A New Normalization and Rescoring Strategy. J. Chem. Inf. Model. 2007;47(4):1564–1571. doi: 10.1021/ci600471m. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ci5c00907_si_001.xlsx (8.9MB, xlsx)
ci5c00907_si_002.pdf (845.7KB, pdf)

Data Availability Statement

Training sets, docking and bioassay results and source data of the figures are published in the Supplementary Data file. Scripts used for the Deep Docking workflow are available at https://github.com/keserulab/uHTVS_toolkit.


Articles from Journal of Chemical Information and Modeling are provided here courtesy of American Chemical Society

RESOURCES