Skip to main content
ACS Omega logoLink to ACS Omega
. 2025 Sep 18;10(38):43616–43631. doi: 10.1021/acsomega.5c03532

MetaAMPK: Accurate Prediction of Adenosine Monophosphate-Activated Protein Kinase Activators Using a Meta-Learner Neural Network

Andi Endang Kusuma Intan , Darlene Nabila Zetta , Kanokwan Jarukamjorn §, Tarapong Srisongkram §,*
PMCID: PMC12489621  PMID: 41048705

Abstract

Adenosine monophosphate (AMP)-activated protein kinase (AMPK) regulates cellular metabolism and is a promising target for metabolic disorders. The activation of AMPK represents a promising therapeutic target for chronic metabolic diseases such as type 2 diabetes and nonalcoholic fatty liver disease. However, accurately predicting AMPK activators remains challenging due to the complexity of its biological data. Given the high global prevalence of chronic metabolic diseases, accelerating the discovery of novel AMPK modulators while reducing time and development costs is cruciala goal that can be effectively addressed through an in silico drug discovery pipeline. This study developed a novel, highly accurate deep learning model, called MetaAMPK, utilizing meta-learners with bidirectional long–short-term memory (BiLSTM) and the convolutional neural network (CNN) to improve the prediction of AMPK activity. This framework encoded multifeature layers including 12 molecular fingerprints and probability features that enable the meta-learners to achieve an accuracy of 0.91, an area under the curve (AUC) of 0.96, and a Matthews correlation coefficient (MCC) of 0.82, ensuring that these models are highly accurate and robust. To further validate the prediction outcome, the meta-learners were tested with Y-randomization, permutation importance, and the applicability domain. Structural importance analysis was elucidated from the test compounds, confirming that the models were able to classify the AMPK activators based on their structure. A generalization test on the 53 independent compounds was done to validate the meta-learners with 0.96 (96%) accuracy, confirming the real-world application of the developed models. Finally, molecular docking studies provide further biological validation of the predicted AMPK activators. The docking results indicate that pseudoberberine, beta-lapachone, and donepezil from predicted AMPK activators exhibit stronger AMPK binding affinities (−8.205, −7.585, and −7.484 kcal/mol, respectively) than metformin (−5.387 kcal/mol), emphasizing the model’s capability to identify novel AMPK activators. Thus, these results prove that our MetaAMPK framework provides highly accurate predictions of AMPK activators, potentially enhancing the computational drug development pipeline.


graphic file with name ao5c03532_0011.jpg


graphic file with name ao5c03532_0009.jpg

1. Introduction

AMPK (AMP-activated protein kinase) is a crucial energy enzyme that regulates cellular energy homeostasis and metabolism. The activation of AMPK has been linked to beneficial effects in metabolic syndrome, obesity, cancer, and cardiovascular health. AMPK has three subunits: α, β, and γ subunits. The α subunit plays a significant role in inhibiting lipogenesis and promoting fatty acid oxidation by downregulating the lipogenic genes such as sterol regulatory element-binding protein 1 (SREBP-1) and fatty acid synthase (FAS), and upregulating genes involved in fatty acid breakdown, including carnitine palmitoyl transferase I (CPT1). This protein also regulates autophagy and oxidative stress in hepatocytes, affecting the hepatic health. Moreover, AMPK-α activation also downregulates inflammatory cytokines such as nuclear factor-kappa B (NF-κB), which corresponds to the progression of nonalcoholic fatty liver disease (NAFLD). As a result, AMPK-α activation offers a promising therapeutic approach for metabolic syndrome and NAFLD management by reducing inflammation and lipid accumulation in the liver.

Accurately predicting AMPK activators using traditional methods is challenging, due to the complexity and heterogeneity of AMPK biological data. Traditional predictive approaches often rely on limited data sets and linear relationship models, which can lead to inaccurate predictions. In contrast, machine learning (ML) techniques offer a more robust method for modeling complex drug-target interactions by capturing both linear and nonlinear relationships to enhance the predictive capabilities. , Moreover, this method can also identify complex relationships in large data sets, make it suitable for large-scale drug discovery processes. Particularly, deep learning (DL) models that can scale to large data sets tend to achieve higher accuracy. Such models utilize neural network architectures capable of learning more robust and complex patterns within the data.

As such neural network architecture, particularly convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM) architectures have shown significant promise in predicting drug activities in quantitative structure–activity relationship (QSAR) modeling through enhancing the accuracy and interpretability of predictions related to molecular structures and their biological activities. CNNs excel at extracting spatial features from molecular data, which can be critical for understanding structure–activity relationships while BiLSTM processes input sequences in both forward and backward directions, allowing the model to capture context from both ends. This enhances the detection of structural alerts in QSAR modeling by focusing on relevant parts of the molecular representation.

Generally, a traditional predictive model often relies on one algorithm or one set of features, which could potentially generate bias toward a single selection. A meta-learner is an approach in which a model learns from the outputs of multiple base learners to improve decision-making in a higher-level model. Unlike traditional machine learning, which relies on a single model, a meta-learner is a stacking ensemble learning approach that combines the predictions from multiple base learners to improve overall performance. This approach enhances generalization and robustness by leveraging the strengths of diverse models. Instead of relying on a single model, a meta-learner combines the outputs of multiple base learners to make final predictions. This ensemble approach leverages the strengths of diverse algorithms, enhances generalization, and improves robustness, especially in complex prediction tasks. ,

In this study, we aim to develop a highly accurate meta-learner method for screening AMPK activators, called MetaAMPK, by leveraging a diverse set of molecular fingerprints and probability features (PFs) derived from the BiLSTM and CNN models. To capture various aspects of chemical representation, we employed 12 different molecular fingerprint schemes to encode chemical structures, ensuring diverse and robust input for the meta-model. For the meta model, we first trained the 24 baseline models constructed from 12 CNN and 12 BilLSTM fingerprint-based models. After that, we extracted the probability score from these 24 baseline models and used it as a model-based representation for building an accurate meta-learner. This feature could enhance the accuracy of the meta-model and may facilitate the identification of potential drug candidates targeting the AMPK protein. The primary contributions of this study are as follow:

  • 1.

    We developed a state-of-the-art meta-learner that learn from the previous experience with error feedback especially for classifying AMPK activators.

  • 2.

    We integrated multibaseline models utilizing 12 knowledge-based molecular fingerprints to learn a diverse patterns of structure–activity relationship.

  • 3.

    We utilized both CNN and BiLSTM models as a meta-learner framework to refine predictions and improve generalization.

  • 4.

    We validated the prediction performance using external test set and evaluated its through multiple gold standard metrics specifically for ML application.

2. Methods

2.1. Data Preparation

The curated AMPK data set is obtained from a previously published study. The initial data set consists of 1628 compounds represented by SMILES strings, with 866 labeled as activators and 762 as controls. Initially, the data underwent a cleaning process, where duplicates, invalid SMILES strings, and compounds with missing class labels were removed. SMILES strings were standardized into their canonical forms to ensure a consistent representation of molecular structures. The data were subsequently split into a training set (1140 compounds) and a test set (488 compounds) with an 70:30 ratio, ensuring a balance between having enough data to train the model effectively while also ensuring that the test set is large enough to provide reliable performance estimates. In this split, we used a rational scaffold split to group the compounds based on their core structures, ensuring that molecules with similar scaffolds appear in either the training or test set, but not both. We applied this rational split utilizing Astartes Python package.

2.2. Molecular Feature Encoding

The 12 molecular fingerprints used in this study are included: (1) atom-pair descriptor 2D (AP2D) with 780 features, (2) atom-pair 2D count descriptor (AP2DC) 780 features, (3) chemistry development kit (CDK) 1024 features, (4) extended CDK (CDKExt) 1024 features, (5) CDK graph-based (CDKGraph) 1024 features, (6) electrotopological state (Estate) 79 features, (7) Klekota-Roth fingerprint (KRFP) 4860 features, (8) Klekota-Roth fingerprint count (KRFC) 4860 features, (9) MACCS fingerprint 166 features, (10) PubChem fingerprint 881 features, (11) Substructure fingerprint (SubFP) 307 features, and (12) Substructure fingerprint count (SubFPC) 307 features. , All molecular fingerprints were computed using a validated PaDeL descriptor cheminformatics to ensure accurate molecular representation.

2.3. Meta-Learner Model

In this meta-learner framework, we construct two sequential model layers, including baseline and meta layers, as described in Figure . The baseline BiLSTM and CNN were constructed using 12 molecular fingerprints. The BiLSTM model is utilized to capture temporal dependencies chemical representation that may relate to molecular activity. Meanwhile, the CNN is utilized to detect spatial features within molecular structures, aiding in the identification of relevant functional groups and substructures for classifying compounds as activators or controls. Each baseline model was independently trained on 12 molecular representations (X) to predict AMPK activity, resulting in 24 sets of PFs. The baseline models are described in eq :

fbaseline(X)=ŷbaseline 1

where f baseline refers to either the BiLSTM or CNN models. Predictions from these models generated ŷ baseline, which were aggregated to form a comprehensive PF set. This set was then used to train two meta-learners, as explained in eq :

fmeta(PFs)=ŷmeta 2

Here, the f meta represents either a CNN or BiLSTM meta-learner trained on the probability features of PFs. Finally, this meta-learner made predictions on the test set, as demonstrated in eq :

ŷfinal=fmeta(PFstest) 3

1.

1

Architecture of meta-learner development.

The baseline models in this architecture were separately trained before implementing in the meta-learner strategy. The BiLSTM model consists of two BiLSTM layers, with the first containing 64 units and returning sequences to the second layer, which has 32 units. This is followed by a fully connected layer with 16 neurons using the ReLU activation function and an output layer with a single neuron using a sigmoid activation function for binary classification. The CNN model comprises a one-dimensional convolutional layer with 32 filters and a kernel size of 3, followed by a max-pooling layer with a pool size of 2. The output is then flattened and passed through a fully connected layer with 16 neurons using ReLU activation before reaching the final sigmoid-activated output layer. Both models use the ADAM optimizer with a learning rate of 0.001 and binary cross-entropy as the loss function to optimize the model performance.

To determine the most effective meta-model, we tested the final meta-learner using both BiLSTM and CNN architectures. All parameters of meta-learners were set to be the same as those of the baseline models. The models were trained for 10 epochs with a batch size of 32 and a validation split ratio of 0.2.

2.4. Model Evaluation

All of the models were evaluated using classification metrics such as accuracy, sensitivity, specificity, F1, Matthews correlation coefficient (MCC), and area under the curve (AUC). These metrics were calculated based on true positive (TP), true negative (TN), false negative (FN), and false positive (FP), as explained in eqs –, respectively, except the AUC values, which were calculated from the area under the curve between sensitivity and 1-specificity.

accuracy=TP+TNTP+TN+FP+FN 4
sensitivity=TPTP+FN 5
specificity=TNTN+FP 6
F1=TPTP+0.5·(FP+FN) 7
MCC=(TN×TPFN×FP)(TP+FP)(TP+FN)(TN+FP)(TN+FN) 8
precision=TPTP+FP 9

2.5. Y-Randomization

To validate the robustness of the developed models, Y-randomization tests were performed by creating new data sets by randomly shuffling the class labels (Y values). The meta-model was retrained with these shuffled Y values, and the prediction performance was evaluated using the MCC metric. A robust model is expected to show a significant drop in performance when trained with shuffled Y values, whereas a nonrobust model would perform similarly to the original model. This process was repeated 100 times, and the new MCC values from the Y-randomized data sets were calculated, confirming the models’ predictive reliability and ruling out chance correlations.

2.6. Applicability Domain Analysis

The applicability domain (AD) was established using a distance-based method, specifically the k-nearest neighbors (kNN) algorithm, to evaluate whether new compounds fall within the chemical space defined by the training data. In this method, the Euclidean distance between a new compound and its nearest neighbors from the training set is calculated. Compounds within the AD, determined by comparing the new compound’s distance to the average distance of training compounds plus a scaling factor, are considered reliable predictions. Conversely, compounds falling outside this threshold are flagged for prediction with caution. The threshold conditions for compounds within and outside the AD are given by eqs and , respectively.

Di<Dk+σSk 10
DiDk+σSk 11

where D i is the distance of the new test compound compared to the k nearest training neighbors. D k and S k are the average and standard deviation of the k nearest neighbors of the training set. Meanwhile, the σ value is set as 0.5 as a significant value of this method.

2.7. Permutation Importance

To evaluate the feature importance from the meta-learner and its baseline, we computed the permutation importance by evaluating the effect of randomly shuffling each feature on model accuracy. First, the original accuracy is determined from two meta-learners. Then, for each feature, its values were randomly permuted 10 times, and the accuracy of the new feature value was recalculated. The importance score is computed by the difference between the permuted accuracy and the original accuracy, as shown in eq :

importancescore=accuracypermutedaccuracyoriginal 12

The average importance score across 10 repetitions quantifies each feature’s contribution. This process ensures a statistically robust evaluation by assessing how disrupting each feature impacts model performance.

2.8. Statistical Assessment

The statistical evaluation of model performance was carried out to determine whether there were significant differences among the baseline and the meta-learners. We tested three key performance metrics, including accuracy, MCC, and AUC scores from all the models. To examine the distribution of each metric, the Shapiro-Wilk test was initially performed. The results revealed that the distributions for all performance metrics deviated from normality (p < 0.05). Thus, we used the Mann–Whitney U test due to non-normal distribution to compare the performance of different fingerprints with each of meta-models.

2.9. Molecular Docking

Molecular docking simulations were performed to study the interactions between the AMPK protein (PDB ID: 4ZHX) and a set of three ligand compounds obtained from an external set, along with one known drug as a reference. The docking process was carried out using AutoDock Vina, which was set up with a grid box centered at coordinates x: 89.1816, y: −35.9797, z: 36.0261, and dimensions of 186.3691, 158.4302, and 104.1376 Å along the x, y, and z axes, respectively. This grid box encompasses the active binding site of the AMPK protein. The ligands were prepared by minimizing their structures and converting them to suitable formats for docking. AutoDock Vina was then used to generate the docking poses and binding affinities of each ligand. , Following these, the results were visualized using PyMOL to analyze the binding interactions and orientations of the ligands within the protein’s active site. To further explore the interaction details, a ligand-protein interaction visualization was generated using LigPlot.

3. Results and Discussion

3.1. Chemical Space Analysis

We explored the chemical space of the AMPK molecules using 12 molecular fingerprints with a combination of t-distributed stochastic neighbor embedding (t-SNE) and kernel density estimation (KDE), as illustrated in Figure . We found that each fingerprint represents a unique pattern or cluster in the high-dimensional space. The first fingerprint, AP2D, exhibited a clearly five-cluster distribution (see arrows), suggesting a high degree of diverse substructure among AMPK activators and controls. This could indicate a specific, well-defined pattern within the data set. In contrast, AP2DC showed a less isolated clustered distribution, observed from the lack of separation of the KDE area. This result implies that the count based on AP2D fingerprints has a low ability to cluster the AMPK distribution. Moving on to the CDK, CDKExt, and CDKGraph fingerprints, these features demonstrated overlapping densities between the AMPK activators and controls. They also showed small, distinct islands that were separately clustered from other major molecular groups. This suggests that these fingerprints can capture only a small amount of variation compared with the AP2D fingerprint. Nevertheless, based on these three fingerprints, the CDKGraph offers a more robust fingerprint that can cluster the AMPK data set into several small islands based on the graph-based approach.

2.

2

Chemical distribution of the AMPK data set using t-SNE distribution.

We also found that the EState fingerprint demonstrated six distinct KDE peaks, suggesting that this feature represents unique and separate clusters. This pattern was also observed in the KRFP and KRFPC fingerprints. However, these two features showed broader KDE distributions, which might reflect a more diverse set of data points within these clusters. This diversity could be due to the presence of a large 4860 substructure fingerprints. For MACCS and PubChem, these two features were characterized by their moderately broad density peaks and large cluster separations. These data suggest that while they are distinct, there may be some shared features or transitional data points that bridge these two clusters. Finally, the SubFP and SubFPC fingerprints stood out with unique KDE distributions. The SubFP displayed a more compact distribution characterized by two large and two small clusters, whereas the SubFPC exhibited a more diverse, yet less distinctly separated set of clusters. This suggests a different distribution pattern that may contribute to the variation in model performance between these two substructure-based features.

Overall, the combined t-SNE visualization and KDE figure offer a comprehensive view of the structure of the AMPK data set, highlighting both the distinctiveness and the interrelations among the 12 fingerprints. However, none of these features show a unique separation between AMPK activators and controls, suggesting that relying solely on molecular fingerprints may not be sufficient to effectively identify AMPK activators.

3.2. BiLSTM Models Performance

The baseline BiLSTM models exhibit significant variability, highlighting the challenges and strengths of this architecture in external testing, as displayed in Table . Among the fingerprints tested, the SubFPC yielded the most balanced and superior performance, achieving the highest accuracy (0.768), sensitivity (0.768), specificity (0.769), F1 score (0.768), and a robust AUC value of 0.849. These results suggest that the SubFPC effectively capture both positive and negative class characteristics, thereby enhancing the model’s discrimination ability. Similarly, the AP2DC also demonstrated strong performance with a high specificity (0.897), precision (0.862), and AUC (0.854), indicating its suitability in contexts where minimizing false positives is critical. On the other hand, PubChem, CDK, and SubFP fingerprints consistently produced competitive metrics across most evaluation scores, particularly excelling in AUC, supporting their reliability for ranking active classes out of inactive classes. Interestingly, MACCS fingerprints showed a relatively high sensitivity (0.753), yet this came at the cost of lower specificity (0.653), suggesting a tendency to bias toward the active class. Meanwhile, fingerprints such as AP2D, KRFPC, and CDKExt were associated with moderate to lower performance across most metrics, indicating potential limitations in representing the chemical features relevant to the classification objective. Overall, these findings underscore the importance of fingerprint selection in deep learning-based cheminformatics models and affirm that the SubFPC and AP2DC fingerprints significantly enhanced the predictive performance of the BiLSTM architecture.

1. Mean Scores of Performance Metrics for BiLSTM Models (n = 3).

fingerprints accuracy sensitivity specificity MCC F1 score AUC precision
AP2D 0.651 0.546 0.758 0.372 0.595 0.672 0.820
Estate 0.716 0.576 0.858 0.454 0.671 0.788 0.809
KRFP 0.683 0.561 0.808 0.413 0.633 0.730 0.814
PubChem 0.751 0.678 0.826 0.513 0.733 0.824 0.805
SubFP 0.743 0.654 0.833 0.495 0.720 0.805 0.800
CDKGraph 0.745 0.714 0.775 0.493 0.737 0.820 0.767
CDK 0.746 0.682 0.812 0.501 0.730 0.816 0.791
KRFPC 0.675 0.514 0.839 0.373 0.614 0.710 0.766
CDKExt 0.684 0.654 0.715 0.373 0.676 0.754 0.704
SubFPC 0.768 0.768 0.769 0.543 0.768 0.849 0.777
AP2DC 0.760 0.626 0.897 0.543 0.724 0.854 0.862
MACCS 0.704 0.753 0.653 0.410 0.719 0.777 0.689

3.3. CNN Models Performance

The performance metrics of the CNN baseline models with the external test demonstrate a clear advantage over BiLSTM models, with noticeable variations across different fingerprints, as shown in Table . Among the tested fingerprints, the KRFP emerged as the top performer, achieving the highest accuracy (0.898), sensitivity (0.889), MCC (0.795), and F1 score (0.897). This result underscores the high performance of the KRFP feature with the CNN framework. Additionally, the CDKGraph obtained a particularly high specificity (0.933), AUC (0.890), and precision (0.928), indicating strong capability in reducing false positives and capturing the activator molecules. While the PubChem, SubFP, and Estate fingerprints also achieved strong performance metrics, they exhibited slightly lower MCC values compared to the top-tier fingerprints, indicating a less structural relationship to the AMPK activity. Interestingly, while MACCS, SubFPC, and AP2DC fingerprints did not outperform KRFP or CDKExt, they still yielded AUC scores above 0.920 and maintained balanced sensitivity and specificity, indicating their utility in maintaining AMPK active identification performance. Overall, the results strongly suggest that CNNs, when paired with structurally rich and count-based fingerprints like KRFP and CDKExt, provide a significant enhancement in predictive capability for molecular classification tasks.

2. Mean Scores of Performance Metrics for CNN Models (n = 3).

fingerprints accuracy sensitivity specificity MCC F1 score AUC precision
AP2D 0.846 0.844 0.847 0.692 0.847 0.901 0.850
Estate 0.856 0.825 0.887 0.714 0.852 0.907 0.882
KRFP 0.898 0.889 0.906 0.795 0.897 0.950 0.906
PubChem 0.874 0.879 0.868 0.747 0.875 0.933 0.871
SubFP 0.859 0.835 0.884 0.721 0.857 0.922 0.882
CDKGraph 0.890 0.848 0.933 0.783 0.886 0.933 0.928
CDK 0.887 0.850 0.926 0.779 0.883 0.961 0.922
KRFPC 0.887 0.877 0.898 0.776 0.887 0.948 0.898
CDKExt 0.889 0.870 0.908 0.778 0.887 0.951 0.906
SubFPC 0.861 0.875 0.847 0.724 0.864 0.935 0.854
AP2DC 0.851 0.833 0.869 0.704 0.849 0.920 0.867
MACCS 0.862 0.874 0.850 0.727 0.865 0.930 0.859

3.4. Meta-Learner Performance

Table demonstrates a significant enhancement in predictive performance for the identification of AMPK activators using meta-models. Both meta-models achieved remarkably high and closely matched evaluation metrics, underscoring the robustness and consistency of the ensemble strategy. Specifically, the meta-BiLSTM slightly outperformed the meta-CNN in terms of accuracy (0.915 vs 0.911), sensitivity (0.897 vs 0.890), and F1 score (0.914 vs 0.910), while maintaining a comparable level of specificity (0.934 vs 0.933) and AUC (0.965 vs 0.967). The high MCC for both models (0.831 for meta-BiLSTM and 0.823 for meta-CNN) further emphasizes their balanced and reliable predictive capabilities, particularly important for data sets with class imbalance. Additionally, both models exhibited elevated precision values (0.932 and 0.931, respectively), reflecting their efficacy in minimizing false positives, an essential feature in virtual screening pipelines aimed at identifying biologically relevant AMPK activators. The nearly equivalent performance of the two meta-learners indicates that the meta-learner framework effectively leverages the strengths of both sequence-based (BiLSTM) and spatial-pattern-based (CNN) architectures, resulting in a synergistic effect that enhances generalization and robustness. These findings highlight the potential of meta-learner approaches to refine and optimize predictive models in cheminformatics, especially for complex biological targets such as AMPK.

3. Mean Scores of Performance Metrics for Meta-Learner Models (n = 3), Including Individual Results From Different Random Seeds.

meta-learners (seed) accuracy sensitivity specificity MCC F1 score AUC precision
BiLSTM (0) 0.915 0.897 0.934 0.831 0.914 0.965 0.932
BiLSTM (42) 0.891 0.963 0.800 0.783 0.907 0.950 0.882
BiLSTM (123) 0.893 0.932 0.845 0.785 0.907 0.942 0.889
CNN (0) 0.911 0.890 0.933 0.823 0.910 0.967 0.931
CNN (42) 0.861 0.873 0.845 0.719 0.874 0.935 0.859
CNN (123) 0.873 0.881 0.863 0.743 0.885 0.936 0.872

In order to examine the consistency and robustness of our meta-models, additional evaluations were conducted with different train-test splits using two additional random seeds, 42 and 123. Table reveals that both BiLSTM and CNN meta-learners maintained high and stable performance across all performance metrics. The different train-test split results show a 1–5% decrease in accuracy, MCC, F1, and precision compared to the original split (seed = 0) in both BiLSTM and CNN models. However, the different train-test split results increase the sensitivity of the BiLSTM model from 0.897 to 0.963 with seed 42 and 0.932 with seed 123. Nevertheless, we found that the additional train-test splits reduced the specificity by around 7–10% in both meta-models, indicating high variation in performance for identifying the inactive molecules for both models. Meanwhile, the CNN has less variation in sensitivity compared with the BiLSTM, indicating it is more stable for identifying the active molecules against the AMPK protein. To ensure the reliability and reproducibility of our findings, we continue to use the original train-test split (seed = 0) for all further experiments. Nevertheless, all baseline evaluation metrics computed from additional random seeds were reported in the Supporting Information.

Furthermore, we statistically compared the evaluation metrics of baseline models with those of each meta-model, as summarized in Figure A–C. We found that the meta-CNN model consistently achieved significant higher mean values across all fingerprints and metrics, with all combinations marked as statistically significant (p < 0.05). This reinforces the highly accurate of the meta-CNN model in capturing features from various molecular fingerprints. Meanwhile, the meta-BiLSTM model also performed well, showing significant differences in almost all combinations. However, one exception was observed in the AUC of the CDK fingerprint, which was not significantly different from that of the meta-BiLSTM model (p = 0.1666), with an average AUC value very close to that of the meta-BiLSTM (0.964). This indicates that, although the meta-BiLSTM generally outperformed other fingerprints, its advantage over CDK in terms of AUC was not statistically significant. These results also suggests that while both models are effective, the meta-CNN model demonstrates superior and more consistent performance across fingerprints and evaluation metrics.

3.

3

Statistical analysis of various metrics of baseline models compared to meta-models. (A–C) represent accuracy, MCC, and AUC scores of the meta-models. The results were expressed as the mean and standard deviation of three separate experiments. The asterisk (*) symbol represents the significant difference between baseline and meta-models with p-values less than 0.05.

3.5. Confusion Matrix Performance

Figure A,B present the confusion matrices of the meta-BiLSTM (A) and meta-CNN (B) models, which were performed with the test set. We found that the BiLSTM achieved 230 true negatives (controls) and 207 true positives, while misclassifying 12 controls as activator (false positives) and 39 activators as control (false negatives). These missclassifications accountable for 5% of false positives and 16% of false negatives. These results suggest that while the BiLSTM maintains a good balance between precision and recall, it tends to underperform in identifying active compounds, as indicated by the higher number of false negatives. In contrast, the CNN model shown in Figure B demonstrates a more balanced prediction, with 224 true negatives and 224 true positives, accompanied by only 18 false positives and 22 false negatives. These missclassifications account for 7% of false positives and 9% of false negatives. Compared with the BiLSTM model, the CNN model achieves a lower false negative rate with a slight increase in false positives, demonstrating improved accuracy in identifying AMPK activators.

4.

4

Confusion matrices of (A) meta-BiLSTM and (B) meta-CNN models.

3.6. AD Performance

To ensure reliable predictions within a defined chemical space, we evaluated the AD performance of both meta-BiLSTM and meta-CNN models across a range of k-neighborhoods. This analysis provides insight into the robustness and generalizability of the models by identifying and excluding out-of-domain compounds (OODs). Table illustrates the AD analysis for the meta-BiLSTM model. We found that the k to 4 was the optimal threshold, resulting in the lowest number of OODs (128) while maintaining a high predictive performance, with an accuracy of 0.964 and an AUC of 0.979. The model also preserved a high MCC (0.927) and precision (0.969), reflecting its reliability and low false-positive rate. With this k, the model’s performance improved over the non-AD analysis, with accuracy increasing by up to 5% (Table ), indicating the best performance of the model can be obtained from the within AD molecules. This result confirms the reliability and trustworthiness of our model in distinguishing AMPK activators with 96% accuracy among compounds within its applicability domain.

4. Performance Metrics of Meta-BILSTM for Different k Values.

k accuracy sensitivity specificity MCC AUC precision F1 score OODs
3 0.966 0.957 0.974 0.932 0.981 0.969 0.963 130
4 0.964 0.951 0.974 0.927 0.979 0.969 0.960 128
5 0.961 0.951 0.969 0.921 0.978 0.963 0.957 129
6 0.961 0.951 0.969 0.921 0.978 0.962 0.957 131
7 0.961 0.951 0.969 0.921 0.978 0.962 0.957 131
8 0.963 0.951 0.974 0.926 0.978 0.969 0.960 134
9 0.961 0.951 0.969 0.921 0.978 0.963 0.957 133
10 0.960 0.951 0.969 0.920 0.978 0.963 0.957 134

In parallel, the meta-CNN model achieved its optimal balance at k = 3, with 139 OODs removed, slightly more than meta-BiLSTM (Table ). Despite this, meta-CNN retained competitive performance, attaining an accuracy of 0.960 and a higher AUC of 0.984. These findings suggest that meta-CNN maintains a strong discriminative capability even with a marginally strict domain. The model also demonstrated excellent precision (0.958) and MCC (0.920), emphasizing its effectiveness in making accurate predictions within a well-defined AD. Nevertheless, the improved accuracy of the meta-CNNrising by up to 5% for within-AD compoundshighlights the critical role of domain-restricted predictions in ensuring model reliability.

5. Performance Metrics of Meta-CNN for Different k Values.

k accuracy sensitivity specificity MCC AUC precision F1 score OODs
3 0.960 0.958 0.962 0.920 0.984 0.958 0.958 139
4 0.966 0.957 0.973 0.931 0.985 0.969 0.963 140
5 0.966 0.957 0.973 0.931 0.985 0.969 0.963 140
6 0.965 0.957 0.973 0.931 0.985 0.969 0.963 141
7 0.960 0.951 0.968 0.920 0.980 0.963 0.957 136
8 0.958 0.951 0.963 0.915 0.978 0.957 0.954 135
9 0.958 0.952 0.963 0.915 0.978 0.957 0.954 134
10 0.958 0.952 0.963 0.915 0.978 0.958 0.955 133

Overall, both models showed consistent robustness across different k values, with a trade-off between slightly reduced OOD counts and performance gains. The meta-BiLSTM had a slight advantage in terms of fewer OODs and comparable metric values, suggesting a marginally broader AD boundary. Meanwhile, meta-CNN showed a higher AUC and precision at its optimal k, suggesting a better capacity to identify AMPK activators than inactive molecules. These results highlight that the AD analysis can effectively filter unreliable predictions, ensuring that both meta-models operate within regions of high confidence, thereby enhancing the reliability of AMPK activator identification in cheminformatics workflows.

3.7. Robustness Performance Using Y-Randomization

To ensure that the predictive performance of the proposed meta-models was not due to random correlations or chance, Y-randomization tests were conducted on both the meta-CNN and meta-BiLSTM models (Figure ). In this validation, the class labels were randomly shuffled multiple times, while retaining the original feature distribution. The models were then retrained and evaluated to assess their ability to distinguish signals from noise. The results clearly show that our model (red mark) achieved a significantly higher MCCtest value of approximately 0.8 in both models, demonstrating strong generalization performance, while the MCCtrain from the shuffled models remained around 0.6. This suggests that even when trained on randomly shuffled labels, the model was still able to fit the training data to some extent, highlighting potential risks of overfitting if not properly validated. However, the sharp contrast between our test performance and the randomized models confirms that our model learned a true underlying relationship rather than noise correlations. These results emphasize the robustness and reliability of our models in making meaningful predictions beyond random chance.

5.

5

Robustness performance of (A) meta-BiLSTM and (B) meta-CNN using Y-randomization.

3.8. Permutation Importance Analysis

Figure A demonstrates the top 10 features of the meta-BiLSTM model, which was the BiLSTM_xac (AP2DC) with the highest importance score of 0.0115, indicating it plays a crucial role for the model accuracy. This was followed by BiLSTM_xsc (SubFPC), with an importance score of 0.0102, suggesting that it also significantly contributes to the model’s performance, but lower than the BiLSTM_xac (AP2DC). The feature CNN_xce (CDKExt) with an importance score of 0.0080 also showed notable influence. Other features such as CNN_xsc (SubFPC) (0.0053) and CNN_xkc/KRFPC (0.0031) demonstrate moderate importance. Several features, particularly those from the BiLSTM model using different fingerprints (i.e., BiLSTM_xkc, BiLSTM_xce, BiLSTM_xke/KRFPC), exhibit relatively low importance scores, implying that these contribute minimally to the model’s performance. The CNN-based features such as CNN_xke and CNN_xac, both with an importance score of 0.0010, appear to be the least influential, potentially offering little predictive value. These findings suggest that while specific fingerprints provide critical predictive information, others may not meaningfully enhance model accuracy and could be reconsidered or excluded in future study.

6.

6

Permutation features importance of the meta-BiLSTM and meta-CNN models. (A) Permutation feature importance of meta-BiLSTM model, (B) permutation feature importance of top performer baseline BiLSTM model using the AP2DC fingerprint, (C) permutation feature importance of meta-CNN model, and (D) permutation feature importance of top performer baseline CNN model using the CDK fingerprint.

Further, we analyzed the feature importance derived from the BiLSTM model with the AP2DC fingerprint and found that atom pairs at a topological distance of one bond contribute most significantly to the predictive performance of the model (Figure B). The feature APC2D1_C_N, representing carbon–nitrogen pairs directly connected by a single bond, emerged as the most influential descriptor, contributing 0.2078 to the overall model importance. This indicates that the presence and configuration of C–N bondscommonly associated with functional groups such as amines and amidesare critical structural elements influencing the target property. Other atom pair features at the same topological level also demonstrated notable contributions, including APC2D1_C_O (carbon–oxygen pair) with an importance of 0.0375, and APC2D1_C_C (carbon–carbon pair) at 0.0307. These results suggest that local atomic environments involving electronegative atoms or carbon frameworks provide meaningful structural information to the model. In contrast, features such as APC2D1_C_Cl (carbon–chlorine, importance = 0.0113), APC2D1_C_X (carbon–halogen/wildcard, 0.0018), and APC2D1_O_S (oxygen–sulfur, 0.0006) exhibited lower contributions, while APC2D1_N_O (nitrogen–oxygen pair) showed an almost negligible importance of 1.11 × 10–17, indicating a minimal role in model predictions. Notably, all features at a topological distance of four bonds (APC2D4) were assigned an importance value of 0, suggesting that longer-range atom pair relationships are not informative for the specific prediction task. This distribution of feature importance emphasizes the relevance of short-range atomic interactions, particularly those involving nitrogen and oxygen, in defining molecular characteristics captured by the AP2DC fingerprint.

Figure C highlights that the feature CNN_xcn (CDK fingerprint) exhibited the highest permutation importance score (0.0258), underscoring its critical role in shaping the model’s predictions. This is followed by CNN_xsc (SubFPC) (0.0219) and CNN_xce (CDKExt) (0.0156). Additionally, CNN_xpc (PubChem) (0.0141) and CNN_xcd (CDKGraph) (0.0107) also maintained moderate importance for the model output. We further evaluated the CDK fingerprint as it is the top feature from the meta-CNN model, as illustrated in Figure D. We found that the FP734 stands out as the most crucial fingerprint with a permutation importance score of 0.0047, closely followed by FP399 (0.0043), FP285 (0.0043), and FP267 (0.0043), all showing very similar importance values. Other notable fingerprints include FP591 (0.0041), FP640 (0.0041), FP81 (0.0039), FP35 (0.0039), and FP717 (0.0037), all contributing to the model’s predictive capabilities. The relatively small variation in importance values across these top features suggests a balanced contribution, highlighting that these molecular descriptors collectively play a significant role in driving classification performance.

Notable, the CDK fingerprint is a structural key-based fingerprint. These fingerprints are generated using predefined atom environments and substructure patterns; however, the individual bit positions (such as FP734) do not correspond to explicit SMARTS patterns or substructures named in a human-readable way. Instead, each bit represents the presence or absence of a specific atom environment or path-based fragment, as determined by the CDK algorithm. These are not directly interpretable but can still be helpful for feature importance ranking and model interpretation in QSAR studies.

3.9. Structural Analysis

Table demonstrates the performance and generalization capacity of the meta-BiLSTM and meta-CNN models on an external test set comprising 20 compounds. These results provide insight into the reliability of the predictions beyond the training space, which is a critical aspect in evaluating the robustness of the developed models. The result showed that the BiLSTM model correctly predicts 14 AMPK activators and 3 AMPK controls, resulting in 17 corrected predictions or 85% accuracy. Meanwhile, the BiLSTM model identified 9 OODs including rosiglitazone, SB-400868-A, cordycepin, CHEMBL525385, vidarabine, trelagliptin, AKOS007854316, F864–0053, and F264–3019. These results suggest that these predictions, although mostly correct, may not be fully reliable due to their distance from the training distribution. On the other hand, the CNN model correctly predicts 15 AMPK activators and 3 AMPK controls, resulting 18 corrected predictions or 90% accuracy. The CNN model, however, identified 10 OODs, slightly higher than the BiLSLTM one molecule, which is narciclasin. The CNN model yields slightly higher prediction accuracy but with a more restricted applicability domain in comparison to the BiLSTM model. Notably, rosiglitazone, SB-400868-A, cordycepin, CHEMBL525385, vidarabine, trelagliptin, AKOS007854316, F864–0053, and F264–3019 were consistently identified as OODs by both models, emphasizing their structural profile divergence from the training data.

6. Twenty Test Molecules With Their True Labels, Predicted Labels, and AD Status from Both Meta-BiLSTM and Meta-CNN Models.

compounds true label meta-BiLSTM ADBiLSTM meta-CNN ADCNN
calebin A activator activator within activator within
rosiglitazone activator activator outside activator outside
SB-400868-A control activator outside activator outside
narciclasin activator activator within activator outside
daphnetin activator activator within activator within
esculetin activator activator within activator within
osthole activator activator within activator within
scopoletin activator activator within activator within
GSK554170A control control within control within
CHEMBL3933251 activator activator within activator within
cordycepin activator activator outside activator outside
CHEMBL525385 control control outside control outside
vidarabine control activator outside activator outside
trelagliptin activator control outside activator outside
amarogentin activator activator within activator within
AKOS007854316 activator activator outside activator outside
GW788388 control control within control within
lutein activator activator within activator within
F684–0053 activator activator outside activator outside
F264–3019 activator activator outside activator outside

To gain further insight into the structural characteristics associated with accurate predictions within the applicability domain, we examined the molecular structures of eight representative compounds that were classified as within domain by the BiLSTM model. These compounds–calebin A (1), narciclasin (2), daphnetin (3), esculetin (4), osthole (5), scopoletin (6), GSK554170A (7), and CHEMBL3933251 (8)–are visualized in Figure . These eight compounds were highlighted with specific atoms and bonds in various colors corresponding to the identified AP2DC substructures. Across the structures, the feature APC2D1_C_C, represented by the cyan color, is the most prominent substructure, which highlights the carbon–carbon (C–C) bonds. This pattern is particularly dominant in compounds (1), (2), (5), (7), and (8), where it outlines the molecular backbone and commonly observed cyclic rings. Additionally, red color, representing the APC2D1_C_N substructure, highlights the nonbasic nitrogen atoms within heteroaromatic rings, visible in compounds (2), (7), and (8). The red-highlighted nitrogen is likely part of a heterocyclic system, possibly contributing to the molecule’s bioactivity through hydrogen bonding interaction. Green color, on the other hand, representing the APC2D1_C_O, appear in structures such as (1), (2), (5), (6), (7), and (8) which correspond to the oxygen-containing motifs such as carbonyl (CO), hydroxyl (OH), and ether (C–O–C) groups. This functional group could contribute to the activation of AMPK.

7.

7

Structural analysis of representation test set compounds within AD of meta-models. These compounds are (1) calebin A, (2) narciclasin, (3) daphnetin, (4) esculetin, (5) osthole, (6) scopoletin, (7) GSK554170A, and (8) CHEMBL3933251.

3.10. Molecular Docking Analysis

Molecular docking studies were conducted to evaluate the binding affinity of selected compounds obtained from the test set of our meta-models prediction. The docking results, as presented in Table , indicate that all compounds demonstrated significant binding interactions, with docking scores ranging from −8.205 to −5.387 kcal/mol. Notably, pseudoberberine exhibited the strongest binding affinity (−8.205 kcal/mol), followed closely by beta-lapachone (−7.585 kcal/mol) and donepezil (−7.484 kcal/mol). Metformin, used as a control drug, displayed the lowest docking score (−5.387 kcal/mol).

7. Molecular Docking and Meta-models Predictions for Predicted Compounds.

compounds docking score (kcal/mol) true class prediction
pseudoberberine –8.205 activator activator
beta-lapachone –7.585 activator activator
donepezil –7.484 activator activator
metformin (control) –5.387 activator activator

We found that all tested compounds were predicted as activator by both true class and meta-models, reinforcing the reliability of our predictive framework. Moreover, each compound falls within the AD of the meta-models, with a probability score of 0.99, indicating high confidence in the predictions. The strong correlation between docking and ML results suggests that these compounds may possess significant potential as bioactive candidates. The superior binding affinity observed in pseudoberberine and beta-lapachone highlights their potential as lead compounds for further optimization and in vitro validation. Metformin is well-known for its role as an AMPK activator and be used as ref . Its relatively lower docking score aligns with expectations, further validating the accuracy of our docking approach. The observed docking scores support the hypothesis that the tested compounds could serve as potential candidates for modulating the target pathway, warranting further experimental investigation.

The docking results showed that pseudoberberine, beta-lapachone, and donepezil exhibit stronger binding affinities to AMPK than metformin, with values of −8.205, −7.585, and −7.484 kcal/mol, respectively, as presented in Table . These compounds primarily interact with AMPK through hydrophobic interactions, whereas metformin also forms hydrogen bonds with key residues, such as Gly68, Glu368, and Asp224 represented in the pink surface (Figure ). In addition, metformin (Figure A) engages in hydrophobic interactions (shown in orange) with residues including Val67, Ala71, Ile166, Tyr165, Pro225, Ala226, and Arg70. Pseudoberberine, which exhibited the strongest binding affinity, interacts with residues Val279, Arg269, Phe244, Pro367, Pro317, Pro365, Asp245, and Ile221 through hydrophobic contacts, represented in orange (Figure B). The remainder of the protein structure is colored cyan, and no hydrogen bonds were observed in this interaction. Similarly, the beta-lapachone (Figure C) binds to AMPK primarily through hydrophobic with residues like His298, Arg70, Arg269, Arg299, Ser242, Val276, Val279, and donepezil (Figure D) interacts with Arg224, Lys243, His151, Thr89, THr87, Leu22, Ile150, Met85, Ile308, Vall30, Leu122, and Arg118. Both of them are in 3D visualization by the orange surface.

8.

8

Docking pose of the best scoring predicted compounds. These compounds are (A) Metformin (control), (B) Pseudoberberine, (C) Beta lapachone, and (D) Donepezil.

Interestingly, all compounds except beta-lapachone reveal both oxygen (red) and nitrogen (blue) atoms in their interactions, which may indicate the presence of polar interactions and potential specificity in ligand binding. Beta-lapachone, however, only reveals oxygen atoms, pointing to a more hydrophobic binding profile.In conclusion, while pseudoberberine, beta-lapachone, and donepezil show stronger binding affinities than metformin, the lack of hydrogen bonds in these interactions suggests that their ability to activate AMPK is less certain.

Pseudoberberine is a berberine analogue, which is known to inhibit mitochondrial respiratory complex I; thereby, it may increase the AMP/ATP ratio and trigger AMPK phosphorylation of AMPK alpha at Thr127. Importantly, pseudoberberine also upregulated LDL receptor (LDLR) expression, suggesting its capacity to modulate both glucose and lipid metabolism. By activating AMPK it able to downstream various target including suppression of SREBP-1c for lipogenesis, enhancement of GLUT4 translocation for glucose uptake, and stimulation of fatty acid oxidation. These findings directly supprot the pseudoberberine’s ability to activate AMPK, encouraging its broader application in the treatment of metabolic disorders such as hyperlipidemia and type 2 diabetes.

Beta-Lapachone is a bioactive compound recognized as a substrate for NADPH:quinone oxidoreductase (NQO1) and has been shown to activate AMPK across various cell types, contributing to its anti-inflammatory, cytoprotective, and anticancer effects. , Under inflammation, it can enhance cell survival in endothelial cells through activation of the NQO1-AMPK–HO-1 signaling axis. Nonetheless, it also exhibit selective cytotoxicity in NQO1-overexpressing tumor cells by inducing apoptosis via AMPK activation, suggesting its potential as a targeted chemotherapeutic agent.

Donepezil, primarily known as a cholinesterase inhibitor for Alzheimer’s disease, also acts as an activator of AMPK phosphorylation, leading to increased mitochondrial biogenesis and ATP production via upregulation of PGC–1α and NRF-1. Beyond neurodegeneration, donepezil has also been shown to modulate inflammation and apoptosis via AMPK signaling, as demonstrated in a model of ulcerative colitis, suggesting broader therapeutic applications.

In this study, we developed a novel meta-deep learning system that integrates the CNN and BiLSTM architectures with 12 molecular fingerprints that can learn to capture complex patterns and enhance the predictive performance of AMPK activator molecules. We found that the BiLSTM and CNN performed with very similar performance against AMPK protein. We also validated our model with the unseen data and a molecular docking experiment, resulting in a validated and reliable predictive method for AMPK activator predictions. To further expand this method we have evaluate the generalization ability of our model, We conducted a new test set using an unseen data set obtained from another study, which comprised of 51 AMPK activators and 2 AMPK inactive molecules. Then, we tested our two meta-models and found that the model achieved a high accuracy of 0.962 for both meta-models, suggesting strong classification predictive performance, as shown in Table . This data confirmed that our model is effective to be generalized into other unseen molecules.

8. Generalization Test Performance Metrics Using 53 Unseen Molecules from PubChem AID1806723.

meta models accuracy
meta-BiLSTM 0.9623
meta-CNN 0.9623

To evaluate the effectiveness of our meta-learner with the previous study, we conducted brenchmark as reported in Table . In a previous study, the same data set was used but with different ML models including random forest (RFC), support vector machine with linear kernel (SVM-C), stochastic gradient boosting (SGB), logistic regression classifier (LRC) and a baseline deep neural network (DNN). The comparison was performed across nearly similar to our meta-learner framework except for MCC metrics. Our meta-learner demonstrated competitive test accuracy (91.5% for BiLSTM and 91.1% for CNN), closely matching the highest-performing models from the previous study, such as SVM-C and SGB (each with 93.0% test accuracy), and clearly outperforming the baseline DNN (90.6%).

9. Brenchmark of Machine Learning Models With the Meta-Models.

models accuracy (%) precision (%) sensitivity (%) specificity (%) AUC MCC
RFC 92.6 90.3 91.2 94.0 0.968
SVM-C 93.0 90.1 93.5 92.4 0.962
SGB 93.0 90.7 92.0 94.0 0.968
LRC 91.0 89.2 97.4 94.8 0.948
DNN 90.6 87.6 90.2 91.1 0.970
meta-BiLSTM 91.5 93.2 89.7 93.4 0.965 0.817
meta-CNN 91.1 93.1 89.0 93.3 0.967 0.829

Notably, both meta-BiLSTM and meta-CNN achieved precision scores above 93%, outperforming all previous models, including SGB (90.7%) and DNN (87.6%). This high precision indicates that our models are more effective in minimizing false positives, which is particularly valuable in critical applications such as drug classification, in this case, to classify AMPK activators. In addition, the specificity scores for meta-BiLSTM (93.4%) and meta-CNN (93.3%) were among the highest recorded, highlighting their strength in correctly identifying negative cases.

Although our meta-learners reported slightly lower sensitivity compared to LRC, which achieved the highest sensitivity at 97.4%, they offered a better balance between sensitivity and specificity. This is reflected in the strong MCC scores (both at >80%), which provide a more holistic evaluation of model performance, particularly under class imbalance. Furthermore, the AUC values of 0.965 for meta-BiLSTM and 0.967 for meta-CNN were relatively high, although not the highest among the compared models, highlighting the stronger ability to correctly identify true AMPK activators and reduce false positives. This is an essential feature for virtual screening workflows. In addition to standard validation methods, we implemented Y-randomization and AD analysis to ensure the model’s robustness and chemical relevance. Crucially, our generalization test using an independent data set achieved an accuracy of 0.96, confirming the model’s strong predictive capability in real world. Furthermore, molecular docking was performed on three test compounds predicted by the meta-models as activators, with results showing strong binding affinities that outperformed a known AMPK activator drug. These findings collectively underscore the reliability and translational value of our MetaAMPK approach in identifying biologically relevant AMPK activators.

While MetaAMPK exhibits strong predictive performance, specifically with accuracy, AUC, and MCC of 0.8 on the test set, the results of the Y-randomization test highlight the need for cautious interpretation. Specifically, the randomized models consistently yielded much lower MCC values (i.e., around 0.0–0.2), indicating that the high performance of the actual model is unlikely due to chance correlations. However, since the performance gap between the original and randomized models is not extreme in all cases, this suggests that the data set may still contain latent biases within a data set. To further validate the model’s robustness and generalizability, we tested the meta-models on an independent data set. The consistently high accuracy from both meta-models observed in this evaluation supports the reliability and potential real-world applicability of the MetaAMPK models.

Our study presents a comprehensive and robust approach to predict AMPK activators by leveraging DL architectures, interpretability tools, and molecular docking validation techniques. While previous research, utilized traditional ML algorithms, our work advances the predictive framework in several key aspects: first, we employed BiLSTM and CNN as baseline models. These architectures are effective in learning sequential patterns and spatial dependencies within molecular fingerprints compared to the traditional DNN model. , Second, we implemented a meta-learner approach to aggregate the probability predictions of the baseline models to improve the overall accuracy and robustness. This meta-model enables the model to learn from the strengths of both BiLSTM and CNN outputs, resulting in enhanced model performance compared to the baseline models. The incorporation of a meta-learner represents a novel and complementary approach to AMPK activation prediction, which has not been reported in a previous study. Third, we conducted a more comprehensive model evaluation using a wide range of performance metrics, including accuracy, specificity, sensitivity, precision, F1 score, MCC, and AUC. In contrast, the previous work primarily reported accuracy, which may not fully reflect model robustness in the presence of class imbalance. Our expanded evaluation offers a more detailed assessment of model reliability and discrimination ability.

Moreover, we enhanced the model interpretability through permutation importance analysis from both global and local importance scores. These scores were computed to identify the most influential fingerprint features, offering insights into the chemical characteristics that contribute to AMPK activators. Improving upon the previous study, which relied solely on a statistical comparison of descriptors, our feature importance results were subsequently used to guide a structure–activity relationship (SAR) analysis, revealing critical substructural patterns. Additionally, we assessed the model’s AD using a kNN approach across different values of k. This analysis identifies the chemical space where the model makes reliable predictions, ensuring safe application of the model to novel compounds, an aspect that was not explored in the prior study. Molecular docking simulations also confirmed that the representative compounds predicted by our model have stronger binding affinities to the AMPK active site. This not only supports the predictive validity of our model but also highlights potential lead compounds for future experimental validation.

Finally, the generalization capacity of our model was tested on an external AMPK data set, achieving a high predictive accuracy of 96%. This confirms that our model is not overfitted with the original training data and can maintain performance on independent compound sets. Together, these contributions distinguish our approach from prior research and demonstrate the value of combining advanced DL frameworks, model interpretation, AD analysis, and molecular docking validation in the context of drug discovery. Our findings not only improve the accuracy and interpretability of AMPK activator prediction but also offer a more rigorous and biologically meaningful framework for virtual screening in kinase-targeted drug development.

4. Conclusion

In this study, we developed a robust predictive framework by constructing 24 predictive models using two baseline architectures, CNN and BiLSTM, in combination with 12 molecular fingerprints. We further explored the impact of using different meta-models, CNN and BiLSTM, in the stacked ensemble approach to assess their effectiveness in capturing complex molecular representations. To validate the robustness and generalization of our meta-models, we applied them to an unseen AMPK data set. The promising results obtained from this external data set further confirm the reliability of our predictive framework in identifying potential AMPK activators. Moreover, molecular docking revealed the stronger binding affinity of the external set compounds compared to the known AMPK activator drug. This demonstrates the model’s applicability in real-world drug discovery scenarios, particularly in screening novel compounds for targeted proteins. Our findings indicate that both meta-models exhibit strong predictive capabilities with closely matched performance scores. However, statistical analysis reveals that the meta-CNN model consistently outperforms its counterpart, demonstrating a superior and more stable performance across various fingerprint types and evaluation metrics. This suggests that meta-CNN not only maintains high predictive accuracy but also offers greater reliability and generalizability in different molecular representations.

Supplementary Material

ao5c03532_si_001.xlsx (73.6KB, xlsx)

Acknowledgments

Ms. Andi Endang Kusuma Intan (A.I.) would like to thank Faculty of Pharmaceutical Sciences, Khon Kaen University for the financial support.

Raw data set used in this manuscript in Excel format is available in the Supporting Information. Python script to reproduce this research can be downloaded from the following URL: https://github.com/taraponglab/metaampk.

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acsomega.5c03532.

  • Raw data set used in this manuscript and the evaluation metrics of random seeds (XLSX)

Conceptualization, A.I., K.J., and T.S.; methodology, T.S.; data curation, A.I. and T.S.; formal analysis, A.I. and T.S.; software, A.I., D.Z. and T.S.; validation, A.I. and T.S.; investigation, A.I. and T.S.; resources, K.J. and T.S.; writing original draft, A.I., T.S.; writing review and editing, A.I. and T.S.; visualization, A.I. and T.S.; supervision, K.J. and T.S.; project administration, T.S.; funding acquisition, T.S.

This study requires no ethics approval as it does not involve with human or animal subjects.

The authors declare no competing financial interest.

References

  1. Cui Y., Chen J., Zhang Z., Shi H., Sun W., Yi Q.. The role of AMPK in macrophage metabolism, function and polarization. Journal of Translational Medicine. 2023;21:892. doi: 10.1186/s12967-023-04772-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Drewe J., Küsters E., Hammann F., Kreuter M., Boss P., Schöning V.. Modeling Structure–Activity Relationship of AMPK Activation. Molecules. 2021;26:6508. doi: 10.3390/molecules26216508. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Safaie N., Idari G., Ghasemi D., Hajiabbasi M., Alivirdiloo V., Masoumi S., Zavvar M., Majidi Z., Faridvand Y.. AMPK activation; a potential strategy to mitigate TKI-induced cardiovascular toxicity. Arch. Physiol. Biochem. 2024;131:329–341. doi: 10.1080/13813455.2024.2426494. [DOI] [PubMed] [Google Scholar]
  4. Gongol B., Sari I., Bryant T., Rosete G., Marin T.. AMPK: an epigenetic landscape modulator. International Journal of Molecular Sciences. 2018;19:3238. doi: 10.3390/ijms19103238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Fang C., Pan J., Qu N., Lei Y., Han J., Zhang J., Han D.. The AMPK pathway in fatty liver disease. Frontiers in Physiology. 2022;13:970292. doi: 10.3389/fphys.2022.970292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Cai Y., Fang L., Chen F., Zhong P., Zheng X., Xing H., Fan R., Yuan L., Peng W., Li X.. Targeting AMPK related signaling pathways: A feasible approach for natural herbal medicines to intervene non-alcoholic fatty liver disease. Journal of Pharmaceutical Analysis. 2025;15:101052. doi: 10.1016/j.jpha.2024.101052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Anggreini P., Kuncoro H., Sumiwi S. A., Levita J.. Role of the AMPK/SIRT1 pathway in non-alcoholic fatty liver disease. Molecular medicine reports. 2022;27:35. doi: 10.3892/mmr.2022.12922. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Theisen R., Wang T., Ravikumar B., Rahman R., Cichońska A.. Leveraging multiple data types for improved compound-kinase bioactivity prediction. Nat. Commun. 2024;15:7596. doi: 10.1038/s41467-024-52055-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Gu S., Liu H., Liu L., Hou T., Kang Y.. Artificial intelligence methods in kinase target profiling: Advances and challenges. Drug Discovery Today. 2023;28:103796. doi: 10.1016/j.drudis.2023.103796. [DOI] [PubMed] [Google Scholar]
  10. Srisongkram T., Weerapreeyakul N.. Drug repurposing against KRAS mutant G12C: a machine learning, molecular docking, and molecular dynamics study. International Journal of Molecular Sciences. 2023;24:669. doi: 10.3390/ijms24010669. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Phimonjit, S. ; Thankam, S. ; Techahongsa, P. ; Thaipisutikul, T. . Towards Drug Discovery: A Comparative Study of Machine Learning-enhanced QSAR Prediction. 2024 16th International Conference on Knowledge and Smart Technology (KST). 2024; pp 79–84.
  12. Zhao, Y. An Application of Data Prediction Model Based on CNN-BiLSTM Method. 2023 IEEE International Conference on Sensors, Electronics and Computer Engineering (ICSECE). 2023; pp 796–799.
  13. Srisongkram T.. DeepRA: A novel deep learning-read-across framework and its application in non-sugar sweeteners mutagenicity prediction. Computers in Biology and Medicine. 2024;178:108731. doi: 10.1016/j.compbiomed.2024.108731. [DOI] [PubMed] [Google Scholar]
  14. Duy H. A., Srisongkram T.. Bidirectional Long Short-Term Memory (BiLSTM) Neural Networks with Conjoint Fingerprints: Application in Predicting Skin-Sensitizing Agents in Natural Compounds. J. Chem. Inf. Model. 2025;65:3035–3047. doi: 10.1021/acs.jcim.5c00032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Xu Y.. Deep neural networks for QSAR. Artificial intelligence in drug design. 2022;2390:233–260. doi: 10.1007/978-1-0716-1787-8_10. [DOI] [PubMed] [Google Scholar]
  16. Duy H. A., Srisongkram T.. Comparative Analysis of Recurrent Neural Networks with Conjoint Fingerprints for Skin Corrosion Prediction. J. Chem. Inf. Model. 2025;65:1305–1317. doi: 10.1021/acs.jcim.4c02062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Duy H. A., Srisongkram T.. Protecting your skin: a highly accurate LSTM network integrating conjoint features for predicting chemical-induced skin irritation. J. Cheminf. 2025;17:39. doi: 10.1186/s13321-025-00980-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Chakravarti S. K., Alla S. R. M.. Descriptor free QSAR modeling using deep learning with long short-term memory neural networks. Frontiers in artificial intelligence. 2019;2:17. doi: 10.3389/frai.2019.00017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Wolpert D. H.. Stacked generalization. Neural networks. 1992;5:241–259. doi: 10.1016/S0893-6080(05)80023-1. [DOI] [Google Scholar]
  20. Muslim M. A., Nikmah T. L., Pertiwi D. A. A., Dasril Y.. et al. New model combination meta-learner to improve accuracy prediction P2P lending with stacking ensemble learning. Intelligent Systems with Applications. 2023;18:200204. doi: 10.1016/j.iswa.2023.200204. [DOI] [Google Scholar]
  21. Srisongkram T., Syahid N. F., Tookkane D., Weerapreeyakul N., Puthongking P.. Stacked ensemble learning on HaCaT cytotoxicity for skin irritation prediction: A case study on dipterocarpol. Food Chem. Toxicol. 2023;181:114115. doi: 10.1016/j.fct.2023.114115. [DOI] [PubMed] [Google Scholar]
  22. Srisongkram T.. Ensemble quantitative read-across structure–activity relationship algorithm for predicting skin cytotoxicity. Chem. Res. Toxicol. 2023;36:1961–1972. doi: 10.1021/acs.chemrestox.3c00238. [DOI] [PubMed] [Google Scholar]
  23. Dobbin K. K., Simon R. M.. Optimally splitting cases for training and testing high dimensional classifiers. BMC Med. Genomics. 2011;4:31. doi: 10.1186/1755-8794-4-31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Bemis G. W., Murcko M. A.. The properties of known drugs. 1. Molecular frameworks. Journal of medicinal chemistry. 1996;39:2887–2893. doi: 10.1021/jm9602928. [DOI] [PubMed] [Google Scholar]
  25. Burns J. W., Spiekermann K. A., Bhattacharjee H., Vlachos D. G., Green W. H.. Machine learning validation via rational dataset sampling with astartes. Journal of Open Source Software. 2023;8:5996. doi: 10.21105/joss.05996. [DOI] [Google Scholar]
  26. Syahid N. F., Weerapreeyakul N., Srisongkram T.. StackBRAF: A large-scale stacking ensemble learning for BRAF affinity prediction. ACS omega. 2023;8:20881–20891. doi: 10.1021/acsomega.3c01641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Yap C. W.. PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints. Journal of computational chemistry. 2011;32:1466–1474. doi: 10.1002/jcc.21707. [DOI] [PubMed] [Google Scholar]
  28. Xuan P., Ye Y., Zhang T., Zhao L., Sun C.. Convolutional neural network and bidirectional long short-term memory-based method for predicting drug–disease associations. Cells. 2019;8:705. doi: 10.3390/cells8070705. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Chicco D., Jurman G.. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 2020;21:6. doi: 10.1186/s12864-019-6413-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Boonsom S., Chamnansil P., Boonseng S., Srisongkram T.. ToxSTK: A multi-target toxicity assessment utilizing molecular structure and stacking ensemble learning. Computers in Biology and Medicine. 2025;185:109480. doi: 10.1016/j.compbiomed.2024.109480. [DOI] [PubMed] [Google Scholar]
  31. Srisongkram T., Tookkane D.. Insights into the structure-activity relationship of pyrimidine-sulfonamide analogues for targeting BRAF V600E protein. Biophys. Chem. 2024;307:107179. doi: 10.1016/j.bpc.2024.107179. [DOI] [PubMed] [Google Scholar]
  32. Netzeva T. I., Worth A. P., Aldenberg T., Benigni R., Cronin M. T., Gramatica P., Jaworska J. S., Kahn S., Klopman G., Marchant C. A.. et al. Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships: The report and recommendations of ecvam workshop 52. Alternatives to Laboratory Animals. 2005;33:155–173. doi: 10.1177/026119290503300209. [DOI] [PubMed] [Google Scholar]
  33. Shapiro S. S., Wilk M. B.. An analysis of variance test for normality (complete samples) Biometrika. 1965;52:591–611. doi: 10.1093/biomet/52.3-4.591. [DOI] [Google Scholar]
  34. Mann H. B., Whitney D. R.. On a test of whether one of two random variables is stochastically larger than the other. annals of mathematical statistics. 1947;18:50–60. doi: 10.1214/aoms/1177730491. [DOI] [Google Scholar]
  35. Morris G., Huey R., Lindstrom W.. et al. AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. J. Comput. Chem. 2009;30:2785–2791. doi: 10.1002/jcc.21256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Sanner M. F.. Python: a programming language for software integration and development. J. Mol. Graph Model. 1999;17:57–61. [PubMed] [Google Scholar]
  37. DeLano, W. L. The PyMOL Molecular Graphics System. http://www.pymol.org, 2002.
  38. Wallace A. C., Laskowski R. A., Thornton J. M.. LIGPLOT: a program to generate schematic diagrams of protein-ligand interactions. Protein engineering, design and selection. 1995;8:127–134. doi: 10.1093/protein/8.2.127. [DOI] [PubMed] [Google Scholar]
  39. Steinbeck C., Han Y., Kuhn S., Horlacher O., Luttmann E., Willighagen E.. The Chemistry Development Kit (CDK): An open-source Java library for chemo-and bioinformatics. Journal of chemical information and computer sciences. 2003;43:493–500. doi: 10.1021/ci025584y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Zhang Y., Wang Y., Bao C., Xu Y., Shen H., Chen J., Yan J., Chen Y.. Metformin interacts with AMPK through binding to γ subunit. Molecular and cellular biochemistry. 2012;368:69–76. doi: 10.1007/s11010-012-1344-5. [DOI] [PubMed] [Google Scholar]
  41. Zhang H., Wei J., Xue R., Wu J.-D., Zhao W., Wang Z.-Z., Wang S.-K., Zhou Z.-X., Song D.-Q., Wang Y.-M.. et al. Berberine lowers blood glucose in type 2 diabetes mellitus patients through increasing insulin receptor expression. Metabolism. 2010;59:285–292. doi: 10.1016/j.metabol.2009.07.029. [DOI] [PubMed] [Google Scholar]
  42. Wang Y.-X., Kong W.-J., Li Y.-H., Tang S., Li Z., Li Y.-B., Shan Y.-Q., Bi C.-W., Jiang J.-D., Song D.-Q.. Synthesis and structure–activity relationship of berberine analogues in LDLR up-regulation and AMPK activation. Bioorganic & medicinal chemistry. 2012;20:6552–6558. doi: 10.1016/j.bmc.2012.09.029. [DOI] [PubMed] [Google Scholar]
  43. Brusq J.-M., Ancellin N., Grondin P., Guillard R., Martin S., Saintillan Y., Issandou M.. Inhibition of lipid synthesis through activation of AMP kinase: an additional mechanism for the hypolipidemic effects of berberine. Journal of lipid research. 2006;47:1281–1288. doi: 10.1194/jlr.M600020-JLR200. [DOI] [PubMed] [Google Scholar]
  44. Kong W.-J., Zhang H., Song D.-Q., Xue R., Zhao W., Wei J., Wang Y.-M., Shan N., Zhou Z.-X., Yang P.. et al. Berberine reduces insulin resistance through protein kinase C–dependent up-regulation of insulin receptor expression. Metabolism. 2009;58:109–119. doi: 10.1016/j.metabol.2008.08.013. [DOI] [PubMed] [Google Scholar]
  45. Byun S. J., Son Y., Cho B. H., Chung H.-T., Pae H.-O.. β-Lapachone, a substrate of NAD (P) H: quinone oxidoreductase, induces anti-inflammatory heme oxygenase-1 via AMP-activated protein kinase activation in RAW264. J. Clin. Biochem. Nutr. 2013;52:106–111. doi: 10.3164/jcbn.12-80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Byun S., Son Y., Pae H.-O.. Cytoprotective effect of ß-lapachone by inducing heme oxygenase-1 expression and AMP-activated protein kinase activation in human endothelial cells. Eur. Rev. Med. Pharmacol. Sci. 2014;18:949–958. [PubMed] [Google Scholar]
  47. Zada S., Hwang J. S., Ahmed M., Lai T. H., Pham T. M., Kim D.-H., Kim D. R.. Protein kinase a activation by β-Lapachone is associated with apoptotic cell death in NQO1-overexpressing breast cancer cells. Oncol. Rep. 2019;42:1621–1630. doi: 10.3892/or.2019.7243. [DOI] [PubMed] [Google Scholar]
  48. Gerber D. E., Beg M. S., Fattah F., Frankel A. E., Fatunde O., Arriaga Y., Dowell J. E., Bisen A., Leff R. D., Meek C. C.. et al. Phase 1 study of ARQ 761, a β-lapachone analogue that promotes NQO1-mediated programmed cancer cell necrosis. British journal of cancer. 2018;119:928–936. doi: 10.1038/s41416-018-0278-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Qadir M. I., Iqbal M. S., Khan R.. β-lapachone: A promising anticancer agent with a unique NQO1 specific apoptosis in pancreatic cancer. Current Cancer Drug Targets. 2022;22:537–540. doi: 10.2174/1568009622666220427121127. [DOI] [PubMed] [Google Scholar]
  50. Kim E., Park M., Jeong J., Kim H., Lee S. K., Lee E., Oh B. H., Namkoong K.. Cholinesterase inhibitor donepezil increases mitochondrial biogenesis through AMP-activated protein kinase in the hippocampus. Neuropsychobiology. 2016;73:81–91. doi: 10.1159/000441522. [DOI] [PubMed] [Google Scholar]
  51. Li A., Zhang J., Chen K., Wang J., Xu A., Wang Z.. Donepezil attenuates inflammation and apoptosis in ulcerative colitis via regulating LRP1/AMPK/NF-κB signaling. Pathology International. 2023;73:549–559. doi: 10.1111/pin.13380. [DOI] [PubMed] [Google Scholar]
  52. N. C. PubChem Bioassay Record for AID 1806723 In Vitro pAMPK1 Kinase Activation Assay from US Patent US11407768: “AMPK activators”, 2025, https://pubchem.ncbi.nlm.nih.gov/bioassay/1806723, Source: BindingDB. Retrieved February 27, 2025.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ao5c03532_si_001.xlsx (73.6KB, xlsx)

Data Availability Statement

Raw data set used in this manuscript in Excel format is available in the Supporting Information. Python script to reproduce this research can be downloaded from the following URL: https://github.com/taraponglab/metaampk.


Articles from ACS Omega are provided here courtesy of American Chemical Society

RESOURCES