Skip to main content
ACS Omega logoLink to ACS Omega
. 2025 Sep 22;10(38):44125–44136. doi: 10.1021/acsomega.5c05380

A Multi-input Deep Learning Architecture for STAT3 Inhibitor Prediction

Kairui Liang ‡,§, Wenling Qin †,*, Yonghong Zhang ‡,*
PMCID: PMC12489654  PMID: 41048810

Abstract

Signal transducer and activator of transcription 3 (STAT3) is a critical factor involved in various physiological and oncogenic signaling pathways. Machine learning models are valuable tools for predicting or screening STAT3 inhibitors. However, the predictive performance and interpretability of existing models still require improvement. In this study, we introduce a fingerprint-enhanced graph (FPG) attention network model, which integrates sequence-based fingerprints and structure-based graph representations to predict STAT3 inhibitors. During the feature learning process, the FPG model converts sequence information into a fingerprint vector, while structural information is encoded into a separate vector using a graph attention network module. These two vectors are then concatenated and passed through a multilayer perceptron for molecular activity classification. Among 49 models with various representations and algorithm combinations, the FPG-based model achieved the best predictive performance, with an average area under the curve of 0.897 on the test set. Furthermore, the model outperformed existing prediction models for identifying STAT3 inhibitors. Additionally, fingerprint analysis and attention heatmaps, combined with SHAP algorithms, provided valuable insights into the structure–activity relationship of STAT3 inhibitors, enhancing model interpretability. To facilitate related research and applications, we developed a web service (STAT3 Pro: https://gzliang.cqu.edu.cn/software/Stat3Pro.html) for STAT3 inhibitor prediction.


graphic file with name ao5c05380_0010.jpg


graphic file with name ao5c05380_0009.jpg

1. Introduction

STAT3 is a cytoplasmic transcription factor, which participates in the normal cellular events, including differentiation, proliferation, and angiogenesis. STAT3 is activated in response to various cytokines, chemokines, and growth factors. , Various preclinical and clinical evidence has confirmed that STAT3 is a promising potential therapeutic target. Because of the critical role of aberrant STAT3 expression and/or activation in cancer development, as well as its significance in other pathologies such as rheumatoid arthritis, atherosclerosis, inflammatory bowel disease, psoriasis, and pulmonary fibrosis, the search for STAT3 inhibitors is a hot spot in the field of medicinal chemistry.

A wide range of direct small-molecule STAT3 inhibitors have been identified, targeting STAT3 dimerization or its DNA binding, including peptides, peptidomimetics, and oligonucleotides. Several FDA-approved drugs, such as celecoxib, BBI608, and pyrimethamine, are currently undergoing clinical trials for cancer immunotherapy. However, developing STAT3 inhibitors with high efficiency and selectivity remains a significant challenge.

Recent advances in STAT3 inhibitor discovery include (1) virtual screening combined with molecular simulations/docking to identify covalent inhibitors; (2) pharmacophore-guided approaches for liver fibrosis therapy; (3) structure-based virtual screening and binding pose metadynamics to discover allosteric covalent inhibitors targeting the coiled-coil and DNA-binding domains of STAT3, with optimization yielding low micromolar inhibitors; and (4) high-throughput virtual screening with MM/GBSA and binding pose metadynamics to identify novel SH2 domain inhibitors with anti-triple-negative breast cancer activity, validated through in vitro and mechanistic studies.

It is important to note that machine learning (ML) presents valuable strategies for ligand-based drug design, as it can leverage known activity data to predict new drug candidates. Nonetheless, because many ML algorithms are black-box models, achieving both high predictive accuracy and interpretability remains a major challenge in AI-driven drug development. Effective molecular representations are crucial for developing drug discovery models that offer both precision and interpretability.

Here we develop a computational model for predicting STAT3 inhibitors. We introduce a novel molecular characterization network model, FPG, which integrates molecular fingerprinting with graph neural networks. Our method fundamentally differs from the method related to fingerprints and graph neural networks (FP-GNN): (i) FP-GNN sequentially combines fingerprint and graph features, whereas our FPG framework processes them in parallel through attention-based coupling; (ii) FPG employ dynamic graph attention to weight atomic contributions adaptively, contrasting with FP-GNN’s static graph convolutions. These innovations enable more flexible feature integration and context-aware atomic representation learning. Through comparisons with various ML and deep learning (DL) algorithms, as well as existing STAT3 inhibitor prediction servers, we demonstrate that FPG not only delivers robust predictive performance but also effectively captures the key structural features critical for STAT3 inhibition. To support the scientific community, we provide a computational tool, “STAT3 Pro” (https://gzliang.cqu.edu.cn/software/Stat3Pro.html), designed to predict and design potential STAT3 inhibitor candidates.

2. Materials and Methods

2.1. Data Source

In the PubChem database (https://pubchem.ncbi.nlm.nih.gov), a search using the keyword “STAT3 inhibitors” identified 251 biological assay methods for evaluating inhibitor activity. Among these, assay AID 862 generated the largest number of inhibitors. Therefore, we selected inhibitor samples identified by this method, resulting in a total of 1724 STAT3 inhibitors and 192,974 noninhibitors. To balance the positive and negative samples and avoid bias in model performance, we applied under-sampling, ultimately obtaining a dataset consisting of 1565 inhibitors and 1671 noninhibitors. To assess the reliability of the model, we extracted 431 inhibitors and 456 noninhibitors from the ChEMBL database (https://www.ebi.ac.uk/chembl/) as an external validation set.

2.2. Molecular Representation

In this study, various molecular structural representations (Figure ) are used to characterize the molecular structure, including molecular fingerprinting, sequence-based embedding, molecular images, and molecular graphs. Based on these, a novel molecular structural representation method, fingerprint-enhanced graph (FPG), is proposed, which integrates molecular fingerprinting and graph neural networks. The details are as follows:

1.

1

Schematic representation of four commonly used molecular characterization modalities.

  • (i)

    Molecular fingerprinting: Three types of molecular fingerprints are calculated, including dictionary-based fingerprints, path/topology-based fingerprints, and extended connectivity fingerprints. These fingerprints are used to represent the molecular physicochemical properties, spatial information, and topological structure. All fingerprints are generated using the RDKit package.

  • (ii)

    Sequence-based embedding: The SMILES strings, which are used to specify chemical molecules, are first converted into corresponding one-hot encoded vectors. These one-hot codes are then passed through an embedding layer to reduce their dimensionality. The downscaled one-hot codes are sequentially fed into a natural language model. In this case, we use a multilayer bidirectional GRU layer and an long short-term memory (LSTM) to capture the forward and backward dependencies within the SMILES strings. A fully connected layer is then used to nonlinearly combine the learned forward–backward features of the SMILES strings, completing the training of the chemical molecule attribute prediction model.

  • (iii)

    Molecular image: To create an image-based representation of the molecules, the SMILES strings of all small molecules were first processed using the RDKit software package to convert them into a standardized canonical SMILES format. These canonical SMILES strings were then transformed into 32 × 32 pixel color images in RGB format. Each image corresponded to a 32 × 32 × 3 three-dimensional RGB data matrix, with pixel values normalized to the range of 0 to 1. These normalized matrices were subsequently used as input data for the modeling process.

  • (iv)

    Molecular graph: A molecular graph represents small molecules as undirected graphs, where nodes correspond to atoms and edges represent chemical bonds. SMILES sequences of small molecules are first converted into graphs using the RDKit package. Each node encodes seven types of information (35-dimensional one-hot vectors) about an atom, including atom type, number of neighboring atoms, number of neighboring hydrogen atoms, formal valency, explicit valency, hybridization type, and whether the atom is part of an aromatic ring. The molecular graph is composed of an adjacency matrix A (N × N) and a feature matrix X (N × 35), where N represents the number of atoms and 35 corresponds to the number of feature dimensions (Table ).

  • (v)

    Fingerprint-enhanced graph (FPG): The novel molecular characterization proposed combines sequence-based fingerprint features with molecular graph-based structural features. In the fingerprint layer, the combination of MACCS and Morgan fingerprints is generated from the SMILES sequences of the molecules. The MACCS fingerprint is a standard dictionary-based molecular fingerprint consisting of 166 SMILES Arbitrary Target Specification (SMART) keys. Each structure key encodes the molecular structure as a binary bit string, where each bit corresponds to a predefined substructure or fragment. If the molecule contains the predefined feature, the corresponding bit is set to 1; otherwise, it is set to 0. On the other hand, Morgan fingerprints generate a bit vector representation by considering the local environment of the molecule, with the radius of the environment directly related to the number of iterations. The combined fingerprint feature vector is then processed through a feed-forward neural network to obtain the fingerprint feature vector for each individual molecule. For the molecular graph component, the graphical information is as described in Table . After convolutional computations, a molecular structural feature vector is obtained, and the coupling feature is derived by correlating the fingerprint and graph-based feature vectors.

1. Atomic Features Used in FPG.

node feature dimension content
atom type 10 C, N, O, P, S, F, Cl, Br, I, Si
degree of atom 6 0, 1, 2, 3, 4, 5
hybridization type 3 Sp, Sp2, Sp3
total Hs 5 0, 1, 2, 3, 4
aromaticity 1 Yes/No
formal charge 3 1, 0, −1
explicit valence 7 0, 1, 2, 3, 4, 5, 6

2.3. Model Construction

2.3.1. Graph Convolution Layer

As illustrated in Figure A, the fundamental concept of the graph convolution layer is to update each node’s representation by aggregating information from neighboring nodes, also known as message passing. A state embedding vector h i is introduced for each node to receive information from neighboring nodes during the iteration process, as shown in eq :

hi=Aggregate(hi,jN(i)hj) 1

where h i represents the feature vector of the node itself, h j represents the feature vector of the neighboring nodes, and N represents the number of neighboring nodes. After several iterations, the output comprehensive feature vector H is generated, which aggregates the features of all nodes in the molecular graph h i (eq ):

H=Readout(iGhi) 2
2.

2

Model architecture of the FPG model. The FPG model combines molecular graph features (processed by graph attention network) and fingerprint vectors (MACCS + Morgan) through a three-layer neural network. These fused features are classified via multilayer perceptron to predict STAT3 inhibitors, with multihead attention providing interpretable atomic contributions.

The aggregation operator, Aggregate, represents the fundamental component of the message-passing mechanism in eq . In this study, the attention mechanism is introduced as an aggregation operator in the message collection phase of the graph. Given N node features, where each node feature dimension is F, the node feature inputs are as shown in eq :

h={h⃗1,h⃗2,...,h⃗N},h⃗iRF 3

A shared linear transformation is applied to each node feature h⃗ i , parametrized by the weight matrix W (l) (R F′× F ), as shown in eq :

z⃗i(l)=W(l)h⃗i(l) 4

The pairwise unnormalized scores between two neighboring embedding vectors z⃗ i (l) and z⃗ j (l), which are z embeddings in series, are computed, followed by their dot product with a learnable weight vector a⃗ (l)T. The LeakyReLU activation function is employed to generate the initial attention coefficients e ij , as detailed in eq :

eij(l)=LeakyReLU(a⃗(l)T[z⃗i(l)z⃗j(l)]) 5

where || is the intervector crosstalk operation. The attention coefficient e ij indicates the importance of node j to node i, which is evaluated by the shared attention mechanism. This graph-based attention mechanism is known as the masked attention mechanism, which typically employs a single-layer feed-forward neural network activated by LeakyReLU as the hidden layer. To facilitate the comparison of attention coefficients across different nodes, the Softmax function is employed here for normalization, resulting in the final attention coefficient α ij as shown in eq :

αij(l)=exp(eij(l))kNiexp(eik(l)) 6

Once the current node has attracted the attention of all neighboring nodes, the node representation vector is updated using the attention as a weight, as illustrated in eq :

h⃗i(l+1)=σ(jNijαij(l)z⃗j(l)) 7

Here the multihead self-attention mechanism is employed, with each head differing in that the initialized weight matrix W is distinct. All the nodes in the aggregated graph are taken to be the mean value, which is used as the final feature output of the graph, as shown in eq :

h(l+1)=avg(h(l+1),1,h(l+1),2,h(l+1),3,h(l+1),4,h(l+1),5) 8

2.3.2. Molecular Fingerprint Layer

As in Figure B, the hybrid fingerprint is generated by concatenating MACCS and Morgan fingerprints, as shown in eq :

FPmixed=(FPMorgan||FPMACCS) 9

A three-layer feed-forward neural network is utilized to capture the information in the molecular fingerprint. The computation of a single layer neural network is illustrated in eq :

VFP=w·FP+b 10

The combination of fingerprints in series is computed in the hidden layers to finally obtain the feature vector representing the molecular fingerprint.

2.4. Hyperparameter Optimization

In this study, a Bayesian optimization strategy was used to find the hyperparameter combinations. The FPG model optimized six hyperparameters: Graph Neural Network (GNN) Dropout, Head Count, Attention Size, Fingerprint-Enhanced Network (FPN) Dropout, FPN Size, and GNN Ratio. The hyperparameter optimization process was performed by default for 20 iterations, with the best-performing set of hyperparameters and evaluation results being outputted.

To avoid overfitting, a dropout strategy was employed that ignores some of the hidden layer nodes in each training batch with a retention probability between 0.1 and 0.9; an early stopping strategy was been set for terminating training early if the monitoring metric value of the validation set has not been improved within predefined epochs. The model used the Adam optimizer for gradient descent. A binary cross-entropy loss was used as the loss function for the classification task. In the process of model optimization, the learning rate is very important. We add the learning rate scheduler and sets the learning rate decay of 0.1 every five steps.

2.5. Performance Evaluation Parameters

The model’s predictive performance was evaluated using the following statistical variables:

Acc=TP+TNTP+FP+TN+FN 11
Pre=TPTP+FP 12
Recall=TPTP+FN 13
Spe=TNTN+FP 14
MCC=TP×TNTP×FN(TP+FP)(TP+FN)(TN+FP)(TN+FN) 15

Here TP, FP, TN, and FN are true positive, false positive, true negative, and false negative, respectively. Accuracy (Acc) refers to the ratio of correctly classified samples to the total number of samples. Precision (Pre) mainly pays attention to positive samples and is defined as the true positive sample proportion in all positive prediction outcomes. Recall, also called Sensitivity (Sen), assesses the coverage ratio of whether all actual positive cases have been predicted and hence is also named the true positive rate. Precision and Recall contradict each other in fact. Specificity (Spe) indicates the correctly identified proportion in actual negatives, equal to true negative rate. The area under the curve (AUC) is the area under the receiver operating characteristic (ROC) curve.

Five-fold cross-validation is applied to trade off the bias and variance of the observed performance metrics so that the obtained metrics would be more reliable.

2.6. Application Domain Analysis

To define the application domain of the FPG model, we employed a combination of structural and physicochemical descriptor-based methods. First, we calculated the Tanimoto similarity between the training set compounds and external validation/test set molecules using Morgan fingerprints to assess structural coverage. Compounds with similarity scores below a threshold (<0.5) were flagged as outside the application domain. We also performed principal component analysis on molecular descriptors to visualize the chemical space distribution. The application domain boundary was delineated using the convex hull method, where molecules falling outside the hull were considered outside the model’s reliable prediction scope. The applicability domain was further validated by analyzing prediction confidence scores, with low-confidence predictions (probability scores near 0.5) indicating potential application domain outliers.

2.7. Web Server Implementation

Based on the FPG model, we developed an online STAT3 inhibitor activity predictor called STAT3 Pro. STAT3 Pro uses Nginx as a reverse proxy server to handle and respond to user requests. To implement the basic interaction logic of the user interface, we employed the lightweight web application framework Flask. On the frontend, graphical elements and charts are rendered using the D3.js library for data visualization of document operations. Additionally, the Vue.js framework is used to create a responsive user interface layout. When users submit compound molecular data through the web page, the data is instantly transmitted to the server backend, where it is validated and analyzed for inhibitor activity using the FPG model built on the Pytorch framework.

3. Results and Discussion

3.1. Training Results Based on the FPG-Based Model

We optimized the model’s hyperparameters using a Bayesian method. As shown in Figure A, a higher GNN Dropout value appears to be associated with lower validation Acc, suggesting that an excessively high dropout rate in this dataset may prevent the model from capturing enough information, leading to over-regularization. The FPN Dropout exhibits a similar behavior to GNN Dropout, with higher values seemingly hindering model performance, indicating that the model has not overfitted the training data. Larger FPN Size values are positively correlated with higher validation Acc, indicating that the model requires higher dimensionality to capture more complex features. There is no clear trend for GNN Ratio, suggesting that it is not a key performance driver or that it interacts with other hyperparameters. The Head Count and Attention Size are related to the model’s attention mechanism.

3.

3

(A) Acc corresponding to individual hyperparameters in the hyperparameter optimization process and (B, C) the FPG model training process based on the best hyperparameter combination: (B) loss value change; (C) Acc value change.

The best performing hyperparameter combination (FPN Dropout = 0.5, GNN Dropout = 0.5, FPN Size = 550, GNN Ratio = 0.4, Head Count = 0.5, and Attention Size = 0.5) was selected for the subsequent five-fold cross-validation. As can be seen from the performance changes of the FPG model during the training process (Figure B,C), in the first 40 epochs, both the training and validation loss decrease rapidly, and Acc increases accordingly. After that, the model gradually converges, triggering early stopping at epoch 76, with the loss reaching its minimum value of 0.1786.

To further validate the model’s robustness, we split the dataset into 80% for training and 20% for external validation. The model was trained and validated using five-fold cross-validation by 100 times. As shown in Table , the FPG-based model achieved an average Acc of 0.7990 and AUC of 0.8716 on the external validation set.

2. Five-Fold Cross-Validation Results of the FPG Model.

metric CV-1 CV-2 CV-3 CV-4 CV-5 average
Acc 0.7904 0.7904 0.7904 0.7927 0.8496 0.7990
Pre 0.8045 0.8045 0.7807 0.7741 0.8499 0.7933
Sen 0.7035 0.7035 0.7649 0.8074 0.8121 0.7668
Spe 0.8714 0.8714 0.8141 0.7789 0.8846 0.8291
AUC 0.8679 0.8679 0.8914 0.8923 0.9089 0.8716
MCC 0.4912 0.4912 0.4909 0.4988 0.6317 0.5120

3.2. Comparison with Other Models

We compared the predictive performance of different molecular representation methods and modeling techniques with the FPG model in predicting STAT3 inhibitor activity. First, we used seven ML algorithms and seven molecular fingerprint methods, constructing a total of 49 ML models for predicting STAT3 inhibitor activity. As shown in Table , the XGBoost-Morgan, RF-Morgan, and AdaBoost-Morgan models achieved AUC scores of 0.8871, 0.8898, and 0.8871, respectively; Acc scores of 0.7954, 0.8185, and 0.8031, respectively; and MCC scores of 0.775, 0.782, and 0.766, respectively. In contrast, models based on decision trees, LR, and KNN algorithms showed lower performance in AUC (0.7222–0.8334) and Acc (0.7143–0.7529) metrics. The poor generalization of decision trees, LR, and KNN models may be due to decision trees easily creating overly complex models, reducing their generalization performance; LR models being too simple to capture key features; and KNN being sensitive to data noise and having low tolerance for errors, resulting in poor performance on external test sets.

3. Prediction Results on Test Sets Using STAT3 Inhibitor Predictive Models Based on Fingerprints.

    test set
model fingerprint Acc AUC Pre Spe Sen MCC
KNN Morgan 0.7143 0.8221 0.6604 0.8412 0.5971 0.4485
Avalon 0.6757 0.7255 0.6358 0.7680 0.5896 0.3624
AtomPair 0.6911 0.7637 0.6331 0.8561 0.5373 0.4127
RDK 0.6873 0.7574 0.6507 0.7623 0.6194 0.3823
TT 0.7259 0.7952 0.6731 0.8422 0.6194 0.4691
2D Pharma 0.7336 0.7815 0.7000 0.7840 0.6866 0.4718
MACCS 0.6486 0.7124 0.6118 0.8406 0.5597 0.3082
DT Morgan 0.7223 0.7222 0.7054 0.7289 0.7164 0.4442
Avalon 0.6718 0.6717 0.6639 0.6484 0.6944 0.3424
AtomPair 0.7027 0.7011 0.7069 0.6561 0.7459 0.4042
RDK 0.6988 0.6982 0.6911 0.6853 0.7163 0.3967
TT 0.7104 0.7091 0.7119 0.6722 0.7463 0.4197
2D Pharma 0.7027 0.7030 0.6846 0.7124 0.6937 0.4058
MACCS 0.6873 0.6862 0.6833 0.6561 0.7165 0.3732
RF Morgan 0.8185 0.8898 0.8145 0.8085 0.8284 0.6133
Avalon 0.8209 0.8788 0.8142 0.7363 0.8433 0.5837
AtomPair 0.7982 0.8671 0.8099 0.7849 0.8284 0.5515
RDK 0.7492 0.8563 0.7679 0.6881 0.8061 0.4982
TT 0.7722 0.8714 0.7752 0.7440 0.7985 0.5436
2D Pharma 0.7568 0.8645 0.7541 0.7361 0.7761 0.5127
MACCS 0.7683 0.8525 0.7826 0.7224 0.8134 0.5365
SVC Morgan 0.7987 0.8893 0.7913 0.7887 0.8012 0.5965
Avalon 0.7777 0.8565 0.7724 0.7623 0.7911 0.5514
AtomPair 0.7568 0.8244 0.7422 0.7633 0.7537 0.5135
RDK 0.7568 0.8512 0.7532 0.7441 0.7687 0.5128
TT 0.7606 0.8511 0.7521 0.7524 0.7687 0.5207
2D Pharma 0.7856 0.8535 0.7799 0.7922 0.7687 0.5603
MACCS 0.7104 0.7875 0.7155 0.6641 0.7537 0.4198
Adaboost Morgan 0.8031 0.9177 0.8425 0.8002 0.8681 0.6287
Avalon 0.7883 0.8701 0.7768 0.6967 0.8134 0.5138
AtomPair 0.7568 0.8559 0.7385 0.7683 0.7463 0.5146
RDK 0.7531 0.8516 0.7671 0.7498 0.7767 0.5252
TT 0.7866 0.8712 0.7846 0.7788 0.8084 0.5902
2D Pharma 0.8002 0.8992 0.8366 0.7621 0.8593 0.6027
MACCS 0.7893 0.8672 0.7998 0.7577 0.8324 0.5478
XGBoost Morgan 0.8145 0.9097 0.8115 0.7925 0.8284 0.6213
Avalon 0.8143 0.8946 0.7623 0.7441 0.7836 0.5281
AtomPair 0.7677 0.8588 0.7531 0.7461 0.7723 0.5181
RDK 0.7568 0.8543 0.7768 0.6969 0.8134 0.5138
TT 0.7761 0.8631 0.7724 0.7686 0.7916 0.5514
2D Pharma 0.7838 0.8667 0.7634 0.8043 0.7687 0.5684
MACCS 0.8069 0.8617 0.7815 0.7441 0.8064 0.5515

Table presents the prediction results of models using different representation methods on an independent test set. ML models generally outperform DL models in predicting STAT3 inhibitor activity. This may be due to the relatively small amount of available activity data, where traditional ML algorithms may perform better than DL algorithms with smaller sample sizes. DL typically requires large amounts of data for model training, while ML algorithms may be more suited for small-sample tasks. However, we noticed that the DL-based FPG model performed better than most descriptor-based models, especially in terms of Acc (0.8208) and AUC (0.9187). This indicates that the FPG model has a significant advantage in distinguishing between active and inactive samples. Additionally, the Sensitivity (0.8284) of FPG is the highest among the models, meaning it performs well in identifying both positive and negative samples. FPG achieved the highest MCC value (0.6354) among all models, indicating that it maintains a good balance in prediction performance. Overall, the excellent performance of the FPG model across these metrics, particularly its high scores in Acc and AUC, demonstrates its strong predictive ability for classification tasks.

4. Prediction Results on Test Sets Using the Models Based on Different Representation Methods.

    test set
model representation Acc AUC Pre Spe Sen MCC
FPG FP + Graph 0.8208 0.9187 0.8115 0.7921 0.8284 0.6354
Attentive FP Graph 0.7375 0.7906 0.7535 0.7914 0.6853 0.4746
GNN   0.7226 0.7896 0.7154 0.7348 0.7097 0.4446
CNN Image 0.6796 0.7414 0.6748 0.6947 0.664 0.3588
GRU Sequence 0.6602 0.7234 0.6792 0.7405 0.5765 0.3212
LSTM   0.6758 0.7483 0.6822 0.7463 0.5984 0.3492
RF Morgan FP 0.8185 0.8898 0.8145 0.8081 0.8284 0.6133
STAT3In Hybrid feature (Descriptor + FP) 0.7658 0.8747 0.7737 0.7563 0.7424 0.5634

3.3. Chemical Space Analysis Using T-SNE

To explore the effectiveness of the feature extraction process using the molecular fingerprint coupled with molecular graph representation in the FPG model, we computed the sequence embeddings (Sequence), molecular graphs (Graph), Morgan fingerprints, and the coupled molecular fingerprint and graph (FP + Graph feature distribution) for all compounds in the dataset. We then performed chemical space analysis using the t-SNE , dimensionality reduction algorithm. Figure compares the chemical space distributions of compounds generated by each representation method. Clearly, the FP + Graph representation provides a distinct separation between active and inactive compounds, with Morgan fingerprints also offering relatively good discrimination. In contrast, the method using only molecular graphs shows poor performance in distinguishing positive and negative samples, while the image-based molecular graph representation fails to distinguish between the two groups in chemical space.

4.

4

Visualization of t-SNE dimensionality reduction for four feature extraction methods (mol2vec, molecule graph, Morgan fingerprint, and fingerprint + graph).

These results suggest that, in this dataset’s chemical space analysis, the coupled representation of molecular fingerprints and molecular graphs outperforms other representation methods. This may be since molecular graphs focus on the distribution of atom and bond weights, while the combined molecular fingerprints emphasize extracting substructure features that significantly contribute to classifying active molecules. The coupling of these two representations allows the model to better uncover the potential relationships between the structural features of inhibitor molecules and their activity. Although Morgan fingerprints are typically represented as high-dimensional vectors, the distance between fingerprints does not always correlate with the similarity of compounds due to hash collisions. The image-based molecular graphs performed the worst in classification, likely because these images introduce large blank areas without useful information.

Therefore, the coupling of molecular fingerprints with graph attention mechanisms offers better potential for explaining the structure–activity relationships of drugs, while also providing robust predictive performance.

3.4. Model Interpretability

The advantage of FPG lies in its dual interpretability, which combines coupled fingerprints and graph attention mechanisms. The coupled fingerprints integrate structures extracted based on prior knowledge, allowing the model to capture chemically meaningful structural features, such as functional groups and special ring structures. On the other hand, the graph attention mechanism, combined with molecular structural topology, provides a visual representation of molecular structure information. In this approach, each node represents an atom, and when calculating attention scores, the model integrates surrounding structural information. Each edge represents a chemical bond, with the score determined by the weighted attention scores of the connected atoms. By analyzing the attention scores of atoms and bonds, the model reveals the influence of molecular substructures on the target property. Based on the number of atoms in the molecule, the model automatically identifies and highlights the atoms and bonds that contribute the most to the target property.

Using a positive sample molecule from the validation set as an example, Figure demonstrates the strength of the dual interpretability of the FPG model compared to other explainable models. First, the molecule was docked to the STAT3 receptor using the CDocker module in Discovery Studio to examine the interactions between the two. Then, we compared the interpretability pattern maps from attentive FP, GNN, a random forest model based on Morgan fingerprints, and FPG.

5.

5

Interpretability comparison of four deep learning models. (A) The docking interaction pattern between the positive molecule and STAT3 receptor. (B) Attentive FP heatmap analysis. Red indicates atoms beneficial to STAT3 inhibitory activity, and blue indicates atoms that are not. (C) Heat map analysis of the GNN model. The shading reflects the attention scores the model assigns to each bond, with darker colors indicating higher attention scores. (D) Morgan-RF explanatory analysis. Red indicates favorable impact on activity, while blue indicates an unfavorable impact. (E) Dual explanatory analysis for FPG.

Figure A shows the interaction mode between the molecule and the STAT3 receptor (PDB ID 6NJS) after docking with CDocker, where quinazolinone interacts with the amino acid residue LYS 591 through multiple π–π interactions. The heatmap (Figure B) shows that attentive FP has not fully learned the complete favorable substructures and does not comprehensively capture the active structure for STAT3, making it unable to distinguish effectively between active and inactive STAT3 inhibitors.

The bond weight heatmap learned by the GNN model (Figure C) displays that the model marks almost the entire molecule, failing to identify the impact of key substructures like the quinazolinone ring on activity differences. This could be due to the presence of simpler cyclic molecules in STAT3 inhibitors, causing the model to overfit during training and making it unable to distinguish nonactive substructures.

Figure D shows the SHAP-based interpretation of the random forest model using Morgan fingerprints, highlighting key fingerprint information that affects activity. The model correctly predicts that quinazolinone, chlorine atoms, and carbonyl groups on heterocycles have a positive influence on inhibitor binding. However, it incorrectly identifies the amide bond (related to the carbonyl substructure) connecting the thiophene ring as detrimental to STAT3 inhibitory activity.

Figure E presents two types of interpretability analyses from the FPG model, based on coupled molecular fingerprints and the self-attention mechanism layer. In the self-attention mechanism heatmap, the quinazolinone structure and the thiophene ring exhibited favorable impact on the inhibitor activity, while in the molecular fingerprint layer, the most influential fingerprints for prediction are the 836th Morgan fingerprint (representing the carbonyl substructure), the 137th MACCS fingerprint (indicating whether a heterocycle is present), and the 45th MACCS fingerprint (representing a phenyl ring), all of which have a positive impact on STAT3 inhibitory activity. These fingerprints are distributed on the quinazolinone structure, and the 836th Morgan fingerprint also represents the carbonyl group attached to the thiophene ring, indicating that the molecular fingerprint layer can capture small substructures that are overlooked by the attention mechanism layer. Therefore, as inferred from the previous dimensionality reduction visualization analysis, the interpretability results from the molecular fingerprint layer and the self-attention mechanism layer complement each other. This dual interpretability is consistent with the molecular docking results, further confirming the internal consistency and interpretability of the model.

We applied the interpretability methods of the FPG model to analyze the structure-toxicity relationship of three representative STAT3-SH2 inhibitors (the phase III clinical drug napabucasin (PubChem ID 10331844), the phase II clinical drug stattic (PubChem ID 2779853), and the phase I clinical drug WP1066 (PubChem ID 11210478)). The reason for selecting these three compounds is as follows: First, these clinical-stage compounds have well-defined STAT3 inhibition mechanisms and abundant experimental data, providing a reliable benchmark for the interpretability analysis of the model. Second, the chemical diversity and druggability characteristics of these three clinical-stage compounds (such as the bromine atom modification in WP1066 and the covalent binding mechanism of stattic) already cover the key design strategies of STAT3 inhibitors, which is sufficient to validate the model’s ability to capture the critical substructures responsible for activity.

Figure A shows napabucasin, whose benzofuranone ring serves as a typical scaffold with pharmacological activity against the STAT3-SH2 domain. The heatmap analysis reveals that the interaction with the STAT3 receptor is primarily focused on the ketone group and the benzene ring. Molecular docking results suggest that the ketone group may form hydrogen bonds with certain polar amino acid residues, while the benzene ring likely interacts with aromatic amino acid residues in the receptor through π-π stacking interactions.

6.

6

Three inhibitors (A) napabucasin, (B) statiic, and (C) WP1066 after docking with STAT3 and dual interpretable schematic diagram of FPG (molecular fingerprint layer and attention heat map). The red areas in the heatmap likely indicate regions with strong binding to the receptor.

Figure B presents the heatmap analysis for stattic. Stattic induces ROS formation through the mitochondrial electron transport chain, and its vinyl sulfone group can form covalent irreversible interactions with STAT3. From the heatmap, the vinyl sulfone group is prominently colored, followed by the nitro group. The results show that the vinyl sulfone group forms a π–π stacking interaction with GLU616 in the receptor, while the nitro group forms hydrogen bonds with THR641 and SER614. The heatmap analysis indicates that the FPG model correctly identifies key substructures of STAT3 inhibitors, and the results are consistent with molecular docking and other reported findings in the literature.

Figure C shows WP1066, where the heatmap analysis aligns with the molecular docking results to some extent. The heatmap reveals that the bromine atom and the cyclic structure are colored more intensely, suggesting that these parts play a critical role in binding to the STAT3 receptor. Molecular docking results suggest that the bromine atom may interact with specific residues in the receptor through halogen bonding, while the pyridine ring and other cyclic structures may bind to aromatic residues in the receptor through π–π stacking interactions. Additionally, the nitrogen-containing phenyl ring may interact with the receptor’s polar residues via hydrogen bonds or dipole interactions, while the cyano group and phenylethyl group in the acrylamide moiety may contribute to the targeted binding affinity for STAT3. These findings suggest that the FPG model’s attention mechanism layer can capture the key structural determinants of STAT3 ligands.

3.5. Application Domain of the FPG Model

In this study, we employed standardized ecotoxicological datasets for fish, crustaceans, and algae, featuring diverse chemical classes with molecular weights of 150–800 Da and logP values of −2 to 8. The data encompassed both acute (LC50/EC50 of 0.1–100 mg/L) and chronic toxicity end points, incorporating key model organisms like Daphnia magna and Danio rerio to represent varying species sensitivities across different trophic levels.

Our results revealed that 89.7% of the external validation set compounds resided within the application domain, exhibiting high structural similarity (Tanimoto > 0.6) and descriptor coverage within the principal component analysis convex hull. These molecules showed consistent prediction accuracy (Acc = 0.82) and high confidence scores (mean probability > 0.8). The remaining 10.3% of compounds, primarily containing rare substructures or extreme descriptor values (logP > 8), were flagged as application domain outliers, with lower prediction reliability (Acc = 0.65). The model’s application domain thus effectively covers typical STAT3 inhibitor scaffolds (e.g., quinazolinones, naphthoquinones) while highlighting limitations for highly divergent chemotypes. This ensures transparent guidance for users regarding the model’s suitable application scope.

Current models exhibit reduced reliability for ionizable compounds (ionic liquids), chronic/low-concentration effects (NOEC < 0.1 mg/L), and under-represented taxa (marine algae). These limitations stem primarily from data gaps in these specific domains, highlighting the need for expanded training sets incorporating mechanistic descriptors and broader taxonomic coverage in future iterations.

3.6. STAT3 Inhibitory Activator Structure

We used the functional group-based substructure fragment derivation method from Pysmash, developed by Yang et al., to explore the substructure features that play a key role in STAT3 inhibitory activity (The filter parameters were set as follows: the probability density function of the binomial distribution (p value) is less than 0.05; the number of blood-toxic compounds containing substructures is greater than 5; and the Acc of the molecule containing the target substructure is greater than 0.80). The results showed that seven functional groups were significantly correlated with STAT3 inhibitory activity. As shown in Table , some unique substructures were frequently found in STAT3 inhibitors. For example, naphthoquinone (ID 3) and 2-acetylfuran (ID 4) appeared in 50 and 62 inhibitors, respectively. A study based on the fusion of S3I-201 and 1,4-naphthoquinone revealed the role of the naphthoquinone structure in the STAT3 active pocket, while the relationship between the 2-acetylfuran group and the blockade of STAT3 phosphorylation and inhibition of dimerization has also been validated. The α,β-unsaturated carbonyl fragment (ID 1) is a key scaffold in chalcones and acts as a Michael acceptor to react with nucleophilic agents, exhibiting broad-spectrum biological activity. Some enone derivatives that inhibit STAT3 phosphorylation lose their STAT3 inhibitory activity completely when the α,β-unsaturated carbonyl fragment is altered, indicating that this fragment has the potential for covalent binding to STAT3. The oxadiazole ring is the key pharmacophore for STAT3 inhibition in STX-0119, as it directly blocks the SH2 domain, thereby inhibiting STAT3 homodimerization and transcriptional activity. The urea group (ID 5) is often used as a linker between two structural fragments of STAT3, further emphasizing the importance of these blood-toxic substructures. Apart from the substructure in ID 8, conjugated systems typically have a significant impact on STAT3. Additionally, the naphthoquinone (ID 3) and sulfonyl (ID 5) groups were captured by the graph attention mechanism layer of the FPG model during the interpretability analysis of the molecules stattic and napabucasin in section , which further confirms the reliability of the FPG-based model.

5. Representative Substructure for STAT3 Inhibitors.

3.6.

3.7. STAT3 Pro Web Server Implementation

We deployed the optimal FPG model to a web platform to facilitate researchers in predicting and designing potential STAT3 inhibitors. As shown on the homepage of the STAT3 Pro Prediction Server (Figure ), there are two ways for users to submit molecules: first, by uploading a CSV file containing the SMILES structure information on compounds (up to 500 SMILES entries), or second, by directly pasting the SMILES structure information into the submission box. After clicking “Run analysis”, the system will automatically send the uploaded SMILES data to the integrated prediction model in the backend and return a CSV file containing the predicted inhibitor activity results for each compound. This file includes prediction scores for each compound, reflecting the molecule’s potential to act as an inhibitor for the specified target. Additionally, the training and test sets used to build the model can be downloaded from the webpage. The STAT3 Pro Web site is https://gzliang.cqu.edu.cn/software/Stat3Pro.html.

7.

7

Home page for STAT3 Pro inhibition prediction server.

In summary, we construct a novel molecular characterization network model, FPG, which combines molecular fingerprints and graph neural networks to predict STAT3 inhibitors. We compare it with 48 models built using different molecular representations (fingerprint-based, molecular graph-based, image-based, and sequence-based), combined with ML and DL techniques. The results show that FPG achieves optimal performance on the independent test set, with an Acc of 0.8208, an AUC of 0.9187, a sensitivity of 0.8284, and an MCC value of 0.6354. Feature visualization reveals that the multifingerprint coupled graph attention mechanism effectively distinguishes active from inactive compounds. This is likely due to the molecular graph’s focus on atomic and bond weight distributions, while the mixed molecular fingerprints emphasize the extraction of substructure features that significantly contribute to the classification of active molecules. The combined representation enables the model to uncover the potential relationships among inhibitor molecules more effectively. The dual interpretability of the graph attention mechanism and molecular fingerprints provides a clearer understanding of the key features the model captures for STAT3 active molecules. Based on the best-performing FPG model, we develop a STAT3 prediction server (https://gzliang.cqu.edu.cn/software/Stat3Pro.html) to assist in the screening and design of STAT3 inhibitors.

To further improve the FPG model’s predictive performance, we could (i) incorporate 3D conformational features and quantum chemical descriptors to better capture steric and electronic interactions, (ii) integrate transfer learning from protein–ligand interaction fingerprints, and (iii) develop an active learning framework to iteratively refine the model with newly tested compounds. These advances would enhance the model’s ability to identify novel scaffolds and optimize lead compounds while maintaining interpretability.

Supplementary Material

ao5c05380_si_001.pdf (102KB, pdf)

Acknowledgments

The authors thank Dr. Dongya Qin from Jinfeng Laboratory of Chongqing and M.Sc. Shiqi Xu from Chongqing University.

The developed STAT3 prediction server is available at https://gzliang.cqu.edu.cn/software/Stat3Pro.html.

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acsomega.5c05380.

  • Table S1: structure and parameters setting of the deep learning models (PDF)

This work was supported by the National Natural Science Foundation of China (22176020) and the CQMU Program for Youth Innovation in Future Medicine (W0181).

The authors declare no competing financial interest.

References

  1. Dong Y., Chen J., Chen Y., Liu S.. Targeting the STAT3 oncogenic pathway: cancer immunotherapy and drug repurposing. Biomed. Pharmacother. 2023;167:115513. doi: 10.1016/j.biopha.2023.115513. [DOI] [PubMed] [Google Scholar]
  2. Caiazzo G., Caiazzo A., Napolitano M., Megna M., Potestio L., Fornaro L., Parisi M., Luciano M. A., Ruggiero A., Testa A.. et al. The use of JAK/STAT inhibitors in chronic inflammatory disorders. J. Clin. Med. 2023;12(8):2865–2878. doi: 10.3390/jcm12082865. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Jafarzadeh A., Nemati M., Jafarzadeh S.. Contribution of STAT3 to the pathogenesis of COVID-19. Microb. Pathog. 2021;154:104836. doi: 10.1016/j.micpath.2021.104836. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Wong A. L. A., Hirpara J. L., Pervaiz S., Eu J. Q., Sethi G., Goh B. C.. Do STAT3 inhibitors have potential in the future for cancer therapy? Expert Opin. Invest. Drugs. 2017;26(8):883–887. doi: 10.1080/13543784.2017.1351941. [DOI] [PubMed] [Google Scholar]
  5. Kaymaz K., Beikler T.. Th17 Cells and the IL-23/IL-17 Axis in the Pathogenesis of Periodontitis and Immune-Mediated Inflammatory Diseases. Int. J. Mol. Sci. 2019;20(14):3394. doi: 10.3390/ijms20143394. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Schein C. H.. Repurposing approved drugs on the pathway to novel therapies. Med. Res. Rev. 2020;40(2):586–605. doi: 10.1002/med.21627. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Reichardt S. D., Amouret A., Muzzi C., Vettorazzi S., Tuckermann J. P., Lühder F., Reichardt H. M.. The Role of Glucocorticoids in Inflammatory Diseases. Cells. 2021;10(11):2921. doi: 10.3390/cells10112921. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Chen X., Tang J., Shuai W., Meng J., Feng J., Han Z.. Macrophage polarization and its role in the pathogenesis of acute lung injury/acute respiratory distress syndrome. Inflammation Res. 2020;69(9):883–895. doi: 10.1007/s00011-020-01378-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Beebe J. D., Liu J.-Y., Zhang J.-T.. Two decades of research in discovery of anticancer drugs targeting STAT3, how close are we? Pharmacol. Ther. 2018;191:74–91. doi: 10.1016/j.pharmthera.2018.06.006. [DOI] [PubMed] [Google Scholar]
  10. Zou S., Tong Q., Liu B., Huang W., Tian Y., Fu X.. Targeting STAT3 in Cancer Immunotherapy. Mol. Cancer. 2020;19(1):145. doi: 10.1186/s12943-020-01258-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Huang Q., Zhong Y., Li B., Ouyang S., Deng L., Mo J., Shi S., Lv N., Wu R., Liu P.. et al. Structure-based discovery of potent and selective small-molecule inhibitors targeting signal transducer and activator of transcription 3 (STAT3) Eur. J. Med. Chem. 2021;221:113525. doi: 10.1016/j.ejmech.2021.113525. [DOI] [PubMed] [Google Scholar]
  12. Kong R., Bharadwaj U., Eckols T. K., Kolosov M., Wu H., Cruz-Pavlovich F. J. S., Shaw A., Ifelayo O. I., Zhao H., Kasembeli M. M.. et al. Novel STAT3 small-molecule inhibitors identified by structure-based virtual ligand screening incorporating SH2 domain flexibility. Pharmacol. Res. 2021;169:105637. doi: 10.1016/j.phrs.2021.105637. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Liu W., Chu Z., Yang C., Yang T., Yang Y., Wu H., Sun J.. Discovery of potent STAT3 inhibitors using structure-based virtual screening, molecular dynamic simulation, and biological evaluation. Front. Oncol. 2023;13:1287797. doi: 10.3389/fonc.2023.1287797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Rafiq H., Hu J., Hakami M. A., Hazazi A., Alamri M. A., Alkhatabi H. A., Mahmood A., Alotaibi B. S., Wadood A., Huang X.. Identification of novel STAT3 inhibitors for liver fibrosis, using pharmacophore-based virtual screening, molecular docking, and biomolecular dynamics simulations. Sci. Rep. 2023;13(1):20147. doi: 10.1038/s41598-023-46193-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Szalai T. V., di Lorenzo V., Péczka N., Mihalovits L. M., Petri L., Ashraf Q. F., de Araujo E. D., Honti V., Bajusz D., Keseru G. M.. Allosteric Covalent Inhibitors of the STAT3 Transcription Factor from Virtual Screening. ACS Med. Chem. Lett. 2025;16(6):991–997. doi: 10.1021/acsmedchemlett.4c00622. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Wang J. H., Zhang P. J., Yu Y. L., Yi Y., Jiang Y. J., Hu S. W.. Discovery of novel STAT3 inhibitors with anti-breast cancer activity: structure-based virtual screening, molecular dynamics and biological evaluation. RSC Med. Chem. 2025;16(6):2848–2865. doi: 10.1039/D5MD00053J. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Cai H., Zhang H., Zhao D., Wu J., Wang L.. FP-GNN: a versatile deep learning architecture for enhanced molecular property prediction. Briefings Bioinf. 2022;23(6):bbac408. doi: 10.1093/bib/bbac408. [DOI] [PubMed] [Google Scholar]
  18. Sohil F., Sohali M. U., Shabbir J.. An introduction to statistical learning with applications in R. Stat. Theory Relat. Fields. 2022;6(1):87–87. doi: 10.1080/24754269.2021.1980261. [DOI] [Google Scholar]
  19. Li M., Xiong A., Wang L., Deng S., Ye J.. ACO Resampling: Enhancing the performance of oversampling methods for class imbalance classification. Knowl.-Based Syst. 2020;196:105818. doi: 10.1016/j.knosys.2020.105818. [DOI] [Google Scholar]
  20. Xiong Z., Wang D., Liu X., Zhong F., Wan X., Li X., Li Z., Luo X., Chen K., Jiang H.. et al. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J. Med. Chem. 2020;63(16):8749–8760. doi: 10.1021/acs.jmedchem.9b00959. [DOI] [PubMed] [Google Scholar]
  21. Dhall A., Patiyal S., Sharma N., Devi N. L., Raghava G. P. S.. Computer-aided prediction of inhibitors against STAT3 for managing COVID-19 associated cytokine storm. Comput. Biol. Med. 2021;137:104780. doi: 10.1016/j.compbiomed.2021.104780. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. van der Maaten L., Hinton G.. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008;9:2579–2605. [Google Scholar]
  23. Hirohara M., Saito Y., Koda Y., Sato K., Sakakibara Y.. Convolutional neural network based on SMILES representation of compounds for detecting chemical motif. BMC Bioinf. 2018;19(19):526. doi: 10.1186/s12859-018-2523-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Yang Z. Y., Yang Z. J., Zhao Y., Yin M. Z., Lu A.-P., Chen X., Liu S., Hou T. J., Cao D. S.. PySmash: python package and individual executable program for representative substructure generation and application. Briefings Bioinf. 2021;22(5):bbab017. doi: 10.1093/bib/bbab017. [DOI] [PubMed] [Google Scholar]
  25. Zhang L., Zhang G., Xu S., Song Y.. Recent advances of quinones as a privileged structure in drug discovery. Eur. J. Med. Chem. 2021;223:113632. doi: 10.1016/j.ejmech.2021.113632. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ao5c05380_si_001.pdf (102KB, pdf)

Data Availability Statement

The developed STAT3 prediction server is available at https://gzliang.cqu.edu.cn/software/Stat3Pro.html.


Articles from ACS Omega are provided here courtesy of American Chemical Society

RESOURCES