Discovery of AKT1 Inhibitors for Obesity and Metabolic Dysfunction-Associated Steatotic Liver Disease Using QSAR-Guided Virtual Screening and Gaussian Accelerated Molecular Dynamics

Kun Cao; Ruonan Wang; Dong Ou; Siyu Wu; Yiyao Chen; Lianhai Li; Xinguang Liu

doi:10.1021/acsomega.5c11688

. 2026 Mar 23;11(13):20461–20479. doi: 10.1021/acsomega.5c11688

Discovery of AKT1 Inhibitors for Obesity and Metabolic Dysfunction-Associated Steatotic Liver Disease Using QSAR-Guided Virtual Screening and Gaussian Accelerated Molecular Dynamics

Kun Cao ^†,^*, Ruonan Wang ^‡, Dong Ou ^†,^§, Siyu Wu ^†,^§, Yiyao Chen ^†,^§, Lianhai Li ^†,^∥,^*, Xinguang Liu ^†,^*

PMCID: PMC13063166 PMID: 41970871

Abstract

The serine/threonine kinase AKT1 (RAC-α protein kinase) functions as a central node of the PI3K/AKT/mTOR signaling pathway, regulating key biological processes such as glucose uptake, lipid metabolism, cell growth, and survival. Persistent activation of this pathway has been strongly implicated in the pathogenesis of metabolic disorders, particularly obesity and metabolic dysfunction-associated steatotic liver disease (MASLD), where it contributes to insulin resistance, hepatic steatosis, and progression toward steatohepatitis. Despite its recognized importance, the development of selective AKT1 inhibitors for metabolic disease applications remains limited. In this study, we implemented an integrated computational pipeline that combines quantitative structure–activity relationship (QSAR) modeling, structure-based virtual screening, molecular docking, Gaussian accelerated molecular dynamics (GaMD) simulations, and MM-GBSA binding free energy analysis to identify novel AKT1 inhibitors. A total of 9361 raw bioactivity records were retrieved from the ChEMBL database and systematically curated to yield a high-quality data set of 2711 compounds with validated IC50 values. QSAR models constructed from this data set demonstrated robust predictive power and were employed to prioritize potential active scaffolds. Subsequent virtual screening and docking identified several promising candidates, with NPC134413, NPC277306, and NPC469442 exhibiting superior binding affinities (−9.42, −9.36, and −9.07 kcal/mol, respectively) compared to the cocrystallized reference ligand (−7.09 kcal/mol). Molecular dynamics simulations confirmed the stability of these complexes, revealing persistent hydrogen bonds and ionic contacts with critical catalytic residues, including Met281, Glu234, Asp292, and Lys277. Structural stability was further supported by RMSD, RMSF, RoG, and PCA analyses, which demonstrated restricted conformational fluctuations in the ligand-bound states. MM-GBSA free energy calculations reinforced these findings, with NPC469442 (−48.54 kcal/mol) displaying the most favorable binding energetics, surpassing the reference complex. Overall, this integrative framework highlights structurally diverse and energetically favorable AKT1 inhibitors with strong therapeutic promise for obesity and MASLD. The results provide a rational basis for advancing these hits toward experimental validation and underscore the utility of combining QSAR-guided screening with GaMD simulations for drug discovery in metabolic diseases.

graphic file with name ao5c11688_0015.jpg

graphic file with name ao5c11688_0013.jpg

1. Introduction

Metabolic diseases, particularly obesity and metabolic dysfunction-associated steatotic liver disease (MASLD), have emerged as major global health challenges, contributing substantially to morbidity and mortality worldwide. , The prevalence of obesity has nearly tripled since 1975, with over 650 million adults classified as obese in 2016 according to the World Health Organization. Simultaneously, MASLD, the most common chronic liver disorder, affects up to a quarter of the global adult population and is increasingly recognized as a driver of end-stage liver disease, hepatocellular carcinoma, and cardiovascular complications. , These conditions are closely interconnected, as obesity represents a primary risk factor for MASLD development and progression, as well as for type 2 diabetes mellitus and metabolic syndrome. The pathogenesis of obesity and MASLD is multifactorial, encompassing genetic, epigenetic, environmental, and behavioral influences. , Lifestyle factors such as energy-dense diets, reduced physical activity, and increased sedentary behavior are pivotal in the recent upsurge of these diseases. However, the underlying molecular mechanisms are complex, involving numerous perturbations in metabolic and inflammatory pathways, and remain incompletely understood. Among the cellular signaling networks implicated in metabolic homeostasis, the phosphoinositide 3-kinase (PI3K)/Akt pathway has attracted particular attention due to its central role in nutrient sensing, energy metabolism, and cell survival. −

Akt, also known as protein kinase B (PKB), is a serine/threonine kinase that operates as a critical node in transducing extracellular signals particularly those triggered by insulin and growth factors into diverse intracellular responses. , Activation of Akt is initiated by the binding of PI3K-generated phosphatidylinositol − -trisphosphate (PIP3) at the plasma membrane, followed by phosphorylation by phosphoinositide-dependent kinase-1 (PDK1) and the mammalian target of rapamycin complex 2 (mTORC2). , Once activated, Akt orchestrates a range of downstream signaling events that regulate glucose uptake (via GLUT4 translocation), glycogen synthesis, lipid metabolism, protein synthesis, and inhibition of apoptosis.

In MASLD, the disruption of hepatic insulin signaling, particularly via defective Akt activation, leads to imbalances between hepatic lipid acquisition and disposal (by β-oxidation and very-low-density lipoprotein secretion). This results in hepatocellular triglyceride accumulation, which underlies steatosis and can trigger progressive liver injury, inflammation, and fibrosis. Furthermore, Akt dysregulation affects mitochondrial dynamics, oxidative stress, autophagy, and pro-inflammatory cytokine production, facilitating the transition from simple steatosis to nonalcoholic steatohepatitis and advanced fibrosis. Notably, recent studies have revealed the existence of distinct Akt isoforms (Akt1, Akt2, and Akt3), each exhibiting tissue-specific expression patterns and nonredundant functions in metabolic regulation. AKT2 has been strongly associated with systemic insulin signaling and glucose homeostasis, whereas AKT1 is more prominently linked to cell growth, survival signaling, and regulation of lipid metabolic processes. Emerging evidence suggests that dysregulated AKT1 signaling contributes to hepatic lipid accumulation, altered mitochondrial function, and inflammatory responses that are characteristic of MASLD progression. However, the role of AKT signaling in metabolic disease is complex and context-dependent, as both insufficient and excessive pathway activation may disrupt metabolic balance. Therefore, selective and finely tuned modulation of specific Akt isoforms, rather than global pathway inhibition, represents a more rational therapeutic strategy.

Current pharmacological interventions for obesity and MASLD remain limited, with few approved agents demonstrating satisfactory efficacy or safety profiles. Accordingly, there is a pressing need to identify novel molecular targets and therapeutic agents. The PI3K/Akt pathway, given its centrality in metabolic regulation, has become a promising target for the development of new drugs. Small-molecule inhibitors of Akt have shown preclinical efficacy in ameliorating metabolic abnormalities, yet their clinical translation has been hindered by issues of selectivity, toxicity, and pharmacokinetics.

Recent advances in computational approaches such as QSAR modeling, virtual screening, and molecular dynamics (MD) simulations have revolutionized the early stages of drug discovery by enabling the rapid identification and optimization of bioactive compounds. The rapid advances in in-silico drug discovery methodologies, which have become indispensable tools for accelerating target-based drug development. In particular, the integration of QSAR modeling, structure-based virtual screening, and molecular dynamics simulations has enabled efficient exploration of large chemical spaces, improved prediction of ligand potency, and detailed characterization of protein–ligand interaction dynamics at atomic resolution. Enhanced sampling techniques, such as GaMD, have further improved the ability to capture rare conformational transitions and binding stability that are often inaccessible to conventional MD simulations. These computational strategies have been successfully applied to a wide range of metabolic and kinase targets, demonstrating their value in guiding rational inhibitor design and reducing experimental attrition in early stage drug discovery. In this study, we employed an integrated QSAR-guided virtual screening workflow combined with molecular dynamics simulations to systematically identify and evaluate novel small-molecule inhibitors targeting Akt. This approach enabled the prioritization of candidates with favorable predicted bioactivity, binding affinity, and dynamic stability. The top-ranked compounds exhibited robust interactions with the Akt active site and maintained structural stability in silico, providing a rational foundation for subsequent experimental validation and therapeutic development in metabolic diseases.

2. Experimental Section

2.1. Retrieval and Curation of the Data Set

Bioactivity data for RAC-α serine/threonine-protein kinase (AKT1, ChEMBL Target ID: CHEMBL4282) were obtained from the ChEMBL database, a comprehensive repository of curated pharmacological information. A total of 9,361 raw activity records were retrieved, encompassing a diverse range of experimental end points, including IC₅₀, K _i, ED₅₀, K _d, EC₅₀, pIC₅₀, V _max, K _m, T _m, and various measures of inhibition, residual activity, and control values. To ensure data set uniformity, reliability, and suitability for QSAR modeling, a systematic curation and harmonization process was applied. Records were first filtered to retain only those reporting half-maximal inhibitory concentration (IC₅₀) values, expressed in nanomolar (nM) units, as this represents the most widely accepted measure of inhibitory potency. To minimize assay heterogeneity, only IC₅₀ values derived from biochemical or enzymatic inhibition assays targeting AKT1 were retained, while cell-based assays or indirect functional readouts were excluded when explicitly annotated in ChEMBL. When multiple IC₅₀ measurements were available for the same compound, experimental variability was addressed by aggregating values using the median IC₅₀, thereby reducing the influence of interassay variation. In addition, potential outlier IC₅₀ values were identified on log-transformed activity data using an interquartile range (IQR)-based filtering approach and removed when they deviated substantially from the central distribution. Entries with missing, ambiguous, or inconsistent annotations were excluded, and duplicate records were eliminated by cross-checking unique compound identifiers. Chemical structures were obtained as SMILES strings and subjected to standardization using cheminformatics protocols, including removal of salts and counterions, normalization of protonation states, and canonicalization to ensure consistent molecular representation. After applying these filtering, harmonization, and standardization steps, the final curated data set comprised 2,711 unique compounds with standardized structures and harmonized IC₅₀ values, which was subsequently used for molecular fingerprint generation and QSAR model development.

2.2. Molecular Fingerprints and Feature Generation

Molecular descriptors and fingerprints were generated using PaDEL-Descriptor to comprehensively represent the chemical space of the curated AKT1 data set. Twelve fingerprint classes were calculated, including CDK and CDK-extended (path-based encodings capturing atom sequences and ring systems), PubChem (881 predefined fragments), and MACCS keys (166 structural features frequently used in QSAR). Additional representations such as E-State, Substructure and Substructure count fingerprints encoded electronic and functional group environments, while the high-dimensional Klekota-Roth and Klekota-Roth count fingerprints (4860 bits) provided detailed fragment-based mappings optimized for SAR studies. Atom Pairs and Atom Pairs count fingerprints further described atom–atom relationships and topological distances, allowing discrimination between molecules with similar scaffolds but different connectivity. Each fingerprint was generated in binary or count-based form, depending on definition, yielding feature spaces that ranged from compact (MACCS) to highly detailed (Klekota-Roth). In addition to fingerprints, continuous molecular descriptors including molecular weight (MW), log P, hydrogen bond donors (HBD), and hydrogen bond acceptors (HBA) were computed to assess drug-likeness based on Lipinski’s rule of five. While these descriptors were not used as inputs for QSAR modeling, they were retained for pharmacokinetic evaluation of prioritized hits. Features with missing values or zero variance were excluded, and the resulting matrices were exported in binary or count-based formats suitable for machine learning. As Random Forest algorithms are scale-invariant, no normalization was applied. This strategy ensured a balanced representation of structural motifs and physicochemical features, capturing both global scaffold-level and fine fragment-level characteristics essential for accurate QSAR predictions.

2.3. SAR Model Development and Validation

Quantitative structure–activity relationship models were constructed using the Random Forest (RF) algorithm implemented in scikit-learn, , selected for its robustness, resistance to overfitting, and proven reliability in cheminformatics applications. The data set of 2711 compounds was randomly divided into a training set (80%) for model building and an external test set (20%) for independent evaluation. Activity values (IC₅₀) were converted to pIC₅₀ (−log₁₀[M]) prior to modeling to normalize activity scales. Hyperparameter optimization was performed through a combination of random and grid search, exploring parameters such as the number of estimators, maximum tree depth, feature selection strategy, and minimum split size, with the optimal configuration determined via 10-fold cross-validation on the training set. , Model performance was assessed using multiple statistical criteria, including the coefficient of determination (R ²), root-mean-square error (RMSE), mean absolute error (MAE), and the modified r²m metric, which accounts for external predictivity and penalizes overfitting. External model performance was specifically evaluated on the held-out test set using R ², RMSE, MAE, and r²m metrics, while applicability domain analysis ensured that scaffold prioritization was restricted to compounds within the validated chemical space, thereby increasing confidence in QSAR-driven hit selection. To ensure the reliability of predictions, applicability domain analysis was conducted using the leverage approach, where the warning leverage threshold (h*) was defined as h* = 3(p + 1)/n, with p representing the number of descriptors and n the number of training compounds. Molecules falling outside this domain were flagged as unreliable, ensuring that only predictions within the validated chemical space were considered for virtual screening.

2.4. Virtual Screening and Molecular Docking

Following QSAR-guided prioritization, top-ranked compounds from the NPC and ZINC libraries were subjected to structure-based molecular docking to characterize their binding orientations and affinities within AKT1. The crystal structure of AKT1 (PDB ID: 4GV1) was retrieved from the Protein Data Bank and prepared using UCSF Chimera, employing its integrated AutoDock Vina docking engine. Protein preparation included removal of crystallographic water molecules beyond 5 Å from the binding site, deletion of alternate conformations, and addition of missing hydrogen atoms. Protonation states of titratable residues were assigned at physiological pH (7.4), followed by restrained energy minimization to relieve local steric clashes. Ligand structures were obtained as SMILES strings and converted to three-dimensional conformations using OpenBabel. Ligands were protonated at pH 7.4, energy-minimized using the MMFF94 force field, and prepared in PDBQT format using AutoDockTools, with all rotatable bonds treated as flexible. Docking calculations were carried out using AutoDock Vina’s empirical scoring function, which estimates binding affinity based on a hybrid knowledge-based and force-field-derived potential. The docking grid was defined to encompass the ATP-binding cleft of AKT1, which includes the hinge region and catalytic residues essential for inhibitor recognition. A cubic grid box of 40 × 40 × 40 Å was centered on the hinge-binding site defined by the cocrystallized ligand, with grid coordinates at x = 11.64, y = 25.71, z = 37.85 Å. This configuration ensured that the pocket containing Met139, Glu92, Asp150, and Lys135 key residues involved in stabilizing kinase inhibitors was fully covered. Docking was performed with an exhaustiveness value of 16, and up to 20 binding poses were generated for each ligand. Docking results were evaluated based on predicted binding affinities (kcal/mol) and visual inspection of interaction profiles. Key stabilizing interactions such as hydrogen bonds with hinge residues, π–π stacking with aromatic residues, and hydrophobic contacts in the ATP pocket were used as criteria for ranking. Comparison with the cocrystallized reference ligand confirmed correct placement within the binding site, validating the docking approach.

2.5. Molecular Dynamics Simulations

Gaussian Accelerated Molecular Dynamics simulations were performed using AMBER v24 to evaluate the stability and dynamic behavior of the AKT1–ligand complexes. The ff19SB force field was applied to the protein, while ligand parameters were generated using the GAFF2 force field with AM1-BCC charges assigned via the Antechamber module, as GAFF2 provides improved parametrization for heterocyclic and aromatic moieties commonly present in kinase inhibitors, thereby enhancing the accuracy of ligand conformational and interaction modeling. , Each complex was solvated in a rectangular TIP3P water box with a 12 Å buffer, and counterions (Na⁺/Cl^–) were added to neutralize the systems using tleap. This setup ensures a physiologically relevant environment while minimizing artificial boundary effects, consistent with previous reports employing GaMD for kinase simulations. Each solvated system underwent a two-stage minimization protocol: 5000 steps of steepest descent followed by 5000 steps of conjugate gradient minimization. The systems were then gradually heated from 0 to 310 K over 200 ps under constant volume (NVT) conditions. This was followed by 1 ns of equilibration under constant pressure (NPT) at 1 bar, maintained using isotropic position scaling. The Langevin thermostat (collision frequency of 2 ps^–1) controlled the temperature, and long-range electrostatics were treated using the Particle Mesh Ewald (PME) method with a 10 Å cutoff. The SHAKE algorithm was applied to constrain all bonds involving hydrogen atoms, allowing for a 2 fs integration time step. These equilibration strategies follow standard best practices in MD protocols. Prior to GaMD, short conventional MD runs were performed to collect potential energy statistics required for boost parametrization. Gaussian accelerated molecular dynamics (GaMD) production runs were then conducted using a dual-boost scheme (igamd= 3), in which boost potentials were applied to both the dihedral and total potential energies. GaMD simulations were continued directly from the equilibrated structures (irest_gamd = 1), without additional conventional MD or boost equilibration phases (ntcmd = 0, nteb = 0). Boost parameters were adaptively determined using potential energy statistics accumulated with an averaging window of 50,000 steps (ntave = 50,000), and the upper limits of the boost potential standard deviations were set to 6.0 kcal/mol for both dihedral and total potential terms (sigma σD= sigma σ P = 6.0), following recommended GaMD practices. Each AKT1–ligand complex was simulated for 200 ns under NPT conditions using a 2 fs time step, with trajectory snapshots saved every 10 ps. Free energy reweighting was performed using second-order cumulant expansion to recover unbiased free energy profiles from the GaMD trajectories.

2.6. Trajectory and Free Energy Analyses

Trajectory analyses were performed using CPPTRAJ and PyTraj. Protein stability was assessed by calculating root-mean-square deviation (RMSD), root-mean-square fluctuation (RMSF), and radius of gyration (RoG). Hydrogen bond occupancy and hydrophobic interaction persistence were quantified to characterize key ligand–protein contacts. Principal component analysis (PCA) was applied to the covariance matrix of backbone atoms to identify dominant motions, while the free energy landscape (FEL) was reconstructed using the first two principal components. Binding free energies were calculated postsimulation using the MM-GBSA method, with 1000 evenly spaced snapshots extracted at 100 ps intervals from the final 100 ns of each trajectory. Energy decomposition provided per-residue contributions, highlighting the residues most critical for ligand stabilization. ,

3. Results and Discussion

3.1. Rationale and Bioactivity Landscape of AKT1 Inhibitors

The serine/threonine kinase AKT1 (RAC-α protein kinase) is a central effector of the PI3K/AKT/mTOR signaling pathway, which governs cell growth, survival, glucose uptake, and lipid metabolism. Dysregulation of this pathway has been strongly implicated in metabolic disorders, particularly obesity and MASLD, where chronic activation of AKT signaling contributes to hepatic steatosis, insulin resistance, and progression toward steatohepatitis. Despite its well-established role in cancer biology, the therapeutic potential of AKT inhibition in metabolic diseases remains underexplored. Traditional drug discovery approaches face several limitations, including the heterogeneity of experimental IC₅₀ measurements, the scarcity of selective inhibitors with favorable pharmacokinetic profiles, and the lack of systematic computational pipelines to prioritize candidate molecules. Therefore, the present study was designed to address these gaps by integrating ligand-based modeling and advanced molecular simulations. To this end, we implemented an integrated QSAR-guided virtual screening and GaMD workflow aimed at identifying potent AKT inhibitors with translational relevance for obesity and MASLD therapy. A curated data set of 2711 structurally standardized compounds with validated IC₅₀ values served as the foundation for QSAR modeling. These models enabled the capture of structure–activity patterns and facilitated the prioritization of novel inhibitors through virtual screening of large chemical libraries. To complement QSAR predictions, molecular docking using AutoDock Vina (via UCSF Chimera) was applied to assess the binding modes and interaction networks of the top-ranked candidates. Recognizing that conventional molecular dynamics may not sufficiently capture conformational transitions of the kinase, we employed GaMD to enhance conformational sampling and evaluate the stability of ligand–protein complexes. This strategy ensures predictive accuracy through QSAR, coupled with atomic-level insight from GaMD, providing a robust platform for rational inhibitor discovery. Furthermore, to gain insight into the overall composition of the curated data set, compounds were classified into active (n = 2079), intermediate (n = 414), and inactive (n = 218) categories based on their IC₅₀ values (Figure a). The predominance of active molecules underscores the strong experimental focus on AKT inhibition within available bioactivity data. Analysis of molecular weight and lipophilicity (Log P) further revealed that most compounds clustered within drug-like chemical space, with active compounds distributed across a broad range of pIC₅₀ values (Figure b). These findings confirm that the data set not only meets the requirements for robust QSAR modeling but also provides a chemically diverse pool of AKT inhibitors suitable for downstream virtual screening and dynamic simulations.

Figure 1. Distribution and molecular property analysis of the curated AKT1 data set. (a) Bioactivity class distribution of compounds, categorized as active (n = 2,079), intermediate (n = 414), and inactive (n = 218) based on IC₅₀ values. (b) Scatter plot of molecular weight versus Log P, with compounds colored by bioactivity class and scaled by pIC₅₀ values.

3.2. Molecular Property Distribution across Bioactivity Classes

To further evaluate the drug-likeness and chemical diversity of the curated data set, key molecular properties were compared across active, intermediate, and inactive classes (Figure ). Boxplot analysis of Log P showed that the majority of compounds, regardless of activity class, clustered within the optimal drug-like range (Log P 1–5), although actives exhibited slightly higher median Log P values compared to inactive. The number of HBA and HBD also revealed class-specific trends. Active compounds generally contained fewer HBDs and HBAs than inactive compounds, suggesting that reduced polarity may favor stronger interaction with the hydrophobic regions of the AKT1 binding pocket. Interestingly, inactive compounds displayed broader distributions with multiple outliers, consistent with their limited bioactivity and potentially suboptimal binding profiles. In terms of MW, most active compounds were within the Lipinski’s rule threshold (<500 Da), whereas inactive showed a tendency toward higher MW values, including several outliers above 900 Da. These observations indicate that active AKT1 inhibitors typically align with drug-like chemical space, while deviations in polarity and size may contribute to reduced inhibitory potency in intermediate and inactive classes.

Comparative distribution of key molecular properties across bioactivity classes in the AKT1 data set. Boxplots depict (a) lipophilicity, (b) hydrogen bond acceptors, (c) hydrogen bond donors, and (d) molecular weight.

3.3. Model Performance Evaluation Using Different Molecular Fingerprints

To establish a predictive framework for AKT1 inhibition, Random Forest models were developed using 12 classes of 2D molecular fingerprints. Model performance was evaluated using RMSE, R ², MAE, and RM² values (Table ). Among all fingerprints, the CDK Extended fingerprints and FingerPrinter-based encodings produced the most robust models, achieving an R ² of 0.96 ± 0.00, with the lowest RMSE (0.28 ± 0.01) and MAE (0.20 ± 0.00). These fingerprints also maintained high RM² values (0.87 ± 0.00), confirming their superior predictive stability. The CDK GraphOnly fingerprints also performed strongly, with an R ² of 0.92 ± 0.00, RMSE of 0.38 ± 0.01, and MAE of 0.28 ± 0.01, while AtomPairs2D_count fingerprints achieved an R ² of 0.91 ± 0.00, RMSE of 0.42 ± 0.01, and MAE of 0.31 ± 0.01. These results highlight their effectiveness in capturing atom connectivity and pairwise relationships for QSAR modeling. In contrast, Estate fingerprints showed weaker performance, with an R ² of 0.72 ± 0.01, higher RMSE (0.71 ± 0.01), and MAE (0.56 ± 0.01), indicating limited reliability. Similarly, Substructure fingerprints produced modest results (R ² = 0.76 ± 0.01, RMSE = 0.65 ± 0.01, MAE = 0.51 ± 0.01), suggesting that fragment-level features alone were insufficient to capture bioactivity trends. Intermediate performance was observed for Klekota-Roth and MACCS fingerprints, with R ² values in the range of 0.86–0.89, RMSE values around 0.44–0.49, and RM² values between 0.75 and 0.79. These fingerprints provided moderate predictive power, reinforcing their utility as supplementary descriptors but not as primary predictors. The comparative performance is further visualized in Supporting Figure S1, which illustrates the relative strengths of each fingerprint class across different metrics. Collectively, these results highlight that Extended and FingerPrinter-based encodings deliver the most accurate and stable models, while Estate and Substructure fingerprints contribute less predictive reliability.

1. Performance Values Are Reported as Mean ± Standard Deviation across Repeated Cross-Validation Runs .

S. No	Fingerprint	RMSE (Mean ± SD)	R ² (Mean ± SD)	MAE (Mean ± SD)	RM² (Mean ± SD)
1	AtomPairs2D_fingerPrintCount	0.42 ± 0.01	0.91 ± 0.00	0.31 ± 0.01	0.80 ± 0.01
2	AtomPairs2D_fingerPrinter	0.54 ± 0.01	0.84 ± 0.01	0.41 ± 0.01	0.72 ± 0.01
3	Substructure_fingerPrintCount	0.44 ± 0.01	0.89 ± 0.00	0.33 ± 0.01	0.79 ± 0.01
4	Substructure_fingerPrinter	0.65 ± 0.01	0.76 ± 0.01	0.51 ± 0.01	0.64 ± 0.01
5	CDK Extended_finterPrinter	0.28 ± 0.01	0.96 ± 0.00	0.20 ± 0.00	0.87 ± 0.00
6	CDK FingerPrinter	0.28 ± 0.01	0.96 ± 0.00	0.20 ± 0.00	0.87 ± 0.00
7	Estate_FingerPrinter	0.71 ± 0.01	0.72 ± 0.01	0.56 ± 0.01	0.59 ± 0.01
8	CDK GraphOnly_FingerPrinter	0.38 ± 0.01	0.92 ± 0.00	0.28 ± 0.01	0.83 ± 0.01
9	KlekotaRoth_FingerprintCount	0.45 ± 0.01	0.89 ± 0.01	0.34 ± 0.01	0.78 ± 0.01
10	KlekotaRoth_FingerPrinter	0.49 ± 0.01	0.86 ± 0.01	0.37 ± 0.01	0.75 ± 0.01
11	MACCS_FingerPrinter	0.44 ± 0.01	0.89 ± 0.01	0.33 ± 0.01	0.79 ± 0.01
12	Pubchem_FingerPrinter	0.41 ± 0.01	0.91 ± 0.00	0.31 ± 0.01	0.81 ± 0.01

Open in a new tab

No statistical significance testing was applied, as models were trained and evaluated on the same dataset and are compared descriptively for model selection.

To exclude chance correlations and rigorously assess the robustness of the Random Forest QSAR models, Y-scrambling and learning-curve analyses were performed for all fingerprint representations. As summarized in Table , Y-scrambling produced consistently low and negative mean R ² values across all fingerprint classes (Yscr R ²_mean ranging from −0.13 to −0.21) with small standard deviations (≤0.05), clearly demonstrating that the predictive performance of the original models did not arise from random correlations between molecular descriptors and biological activity. The external test-set performance further confirmed the robustness and generalizability of the developed models. Substructure-based fingerprints and E-State fingerprints achieved the highest predictive accuracy, with R ²_test values of 0.97 and low prediction errors (RMSE_test ≈ 0.21 and MAE_test ≈ 0.12–0.13). Klekota–Roth and AtomPairs2D fingerprints also exhibited strong predictive performance, yielding R ²_test values between 0.93 and 0.96 with moderate RMSE and MAE values. MACCS and PubChem fingerprints maintained stable and reliable behavior, both achieving R ²_test values of 0.92, indicating good generalization to unseen compounds. In contrast, CDK Graph-Only, Extended, and FingerPrinter representations showed comparatively weaker predictive performance on the external test set, with lower R ²_test values (0.79–0.85) and higher RMSE_test and MAE_test values, suggesting reduced descriptive power for AKT1 inhibitor activity within the present data set. Nevertheless, their Y-scrambling results remained strongly negative, confirming that even these models were not driven by chance correlations. Additionally, learning-curve analysis (Supporting Figure S2) revealed a gradual convergence between training and validation R ² values as the training set size increased, indicating stable learning behavior and the absence of severe overfitting.

2. External Validation and Y-Scrambling Results of Random Forest QSAR Models Using Different Molecular Fingerprints .

Fingerprint	R ²_test	RMSE_test	MAE_test	Yscr_R ²_mean	Yscr_R ²_sd
Substructure_fingerPrintCount	0.97	0.21	0.12	–0.13	0.05
Estate_FingerPrinter	0.97	0.21	0.12	–0.21	0.05
Substructure_fingerPrinter	0.97	0.21	0.13	–0.20	0.05
KlekotaRoth_FingerprintCount	0.96	0.24	0.16	–0.14	0.05
AtomPairs2D_fingerPrinter	0.94	0.29	0.18	–0.19	0.05
AtomPairs2D_fingerPrintCount	0.94	0.29	0.19	–0.14	0.04
KlekotaRoth_FingerPrinter	0.93	0.33	0.22	–0.16	0.05
MACCS_FingerPrinter	0.92	0.34	0.22	–0.15	0.04
Pubchem_FingerPrinter	0.92	0.34	0.23	–0.14	0.05
CDK GraphOnly_FingerPrinter	0.85	0.48	0.33	–0.17	0.05
CDK Extended_finterPrinter	0.80	0.57	0.41	–0.13	0.04
CDK FingerPrinter	0.79	0.58	0.42	–0.14	0.04

Open in a new tab

Test-set metrics (R ², RMSE, MAE) demonstrate model predictivity, while negative Y-scrambled R ² values confirm robustness and exclude chance correlations.

3.4. QSAR Model Performance Using Molecular Fingerprints

To evaluate the predictive potential of different molecular representations, Random Forest models were constructed using 12 classes of 2D fingerprints. Although several fingerprints, such as Klekota–Roth and Substructure-based encodings, are high dimensional, explicit dimensionality reduction was not applied in order to preserve chemically interpretable features; instead, the intrinsic feature-selection capability of the Random Forest algorithm was relied upon to mitigate redundancy and reduce overfitting. Other well-performing models included GraphOnly and AtomPairs2D fingerprints, which achieved R ² values of 0.91–0.92 and relatively low RMSE values (∼0.38–0.42). In contrast, Estate and Substructure fingerprints yielded weaker models, with R ² values of 0.72 and 0.76, respectively, and higher error metrics, reflecting limited predictive capability. Moderate results were observed for Klekota-Roth and MACCS fingerprints, which showed R ² values in the range of 0.86–0.89 and RM² scores of 0.75–0.79, suggesting stable but less predictive performance compared to Extended and PubChem fingerprints. Further validation using the modified r²m parameter confirmed the stability of the best-performing models (Table ). Both Extended and FingerPrinter fingerprints achieved r²m values of 0.89 (training), 0.85 (cross-validation), and 0.86 (external test set), with minimal Δr²m variations (0.02–0.04). Similarly, PubChem fingerprints displayed consistent results (r²m = 0.88, 0.84, and 0.85 for training, CV, and external sets, respectively). By comparison, the Estate fingerprint model recorded lower robustness, with an external r²m of 0.77, indicating weaker predictive generalization. The visual performance of the models is illustrated in Figure , which presents scatter plots of experimental versus predicted pIC₅₀ values for all 12 fingerprint classes. The plots clearly demonstrate that Extended, FingerPrinter, and PubChem fingerprints yielded predictions tightly clustered along the diagonal line, indicating excellent agreement between predicted and experimental activities. By contrast, broader scatter was observed for Estate and Substructure fingerprints, confirming their reduced predictive reliability. Models based on Klekota-Roth and AtomPairs2D fingerprints showed intermediate clustering, in line with their statistical performance metrics. The comparative evaluation is further emphasized in Supporting Figure S1, which provides a heatmap overview of normalized model performance across different metrics. Collectively, these findings confirm that Extended, FingerPrinter, and PubChem fingerprints are the most effective encodings for QSAR modeling of AKT1 inhibitors, while Estate and Substructure fingerprints demonstrate limited predictive utility.

Experimental versus predicted pIC₅₀ values for AKT1 inhibitors using Random Forest models developed with 12 different molecular fingerprint classes. The diagonal line represents perfect correlation. Extended, FingerPrinter, and PubChem fingerprints demonstrated the highest predictive accuracy, with data points clustering tightly around the diagonal, whereas Estate and Substructure fingerprints showed greater dispersion, consistent with their lower performance metrics.

3.5. Feature Importance Analysis of Substructure Fingerprints

To better understand the structural determinants driving model predictions, a feature importance analysis was performed on the Substructure fingerprint class using the Gini index derived from Random Forest models (Figure ). The SubFPC refers to individual predefined substructure fingerprint components generated by PaDEL, and the numeric indices following SubFPC (subFPC275, subFPC307) correspond to specific chemical substructure patterns encoded in the fingerprint dictionary. Higher Gini index values indicate a stronger contribution of the corresponding substructure to model predictions. The analysis identified a subset of highly influential substructure fragments (e.g., SubFPC275, SubFPC307, SubFPC2, and SubFPC3) that contributed disproportionately to model performance. These features displayed markedly higher Gini index values compared to the majority of fragments, suggesting their strong relevance in capturing the chemical patterns associated with AKT1 inhibition. Interestingly, many of the most important fragments correspond to functional groups and ring systems frequently observed in kinase inhibitor scaffolds, supporting their mechanistic significance. Conversely, the majority of substructure bits contributed only marginally to prediction accuracy, reflecting redundancy in the feature space. This analysis not only highlights the predictive substructures that drive bioactivity classification but also provides interpretability to the QSAR models, bridging the gap between statistical prediction and chemical intuition.

Feature importance analysis of the Substructure fingerprint class based on the Gini index from Random Forest models.

3.6. Applicability Domain Analysis

A critical step in QSAR modeling is the evaluation of the applicability domain, which determines whether the model is making predictions within a reliable chemical space. To assess this, a Williams plot was constructed by plotting standardized residuals versus leverage values for the Substructure Count-based Random Forest model (Figure ). The majority of compounds clustered well within the predefined boundaries of standardized residuals (−3 to +3) and below the critical leverage threshold (h = 0.015), indicating that most predictions were both accurate and reliable. This confirms that the training data adequately covered the relevant chemical space of AKT1 inhibitors, reducing the likelihood of extrapolation-based errors during prediction. Only a limited number of molecules were identified as outliers, either due to residuals outside the acceptable range or leverage values above the warning limit. These outliers may reflect structurally diverse scaffolds not well represented in the training set or may arise from experimental inconsistencies in reported IC₅₀ values. Importantly, their small proportion suggests that they do not compromise the overall robustness of the model. The results of the AD analysis demonstrate that the Random Forest model built with Substructure Count fingerprints possesses a strong predictive domain, with the vast majority of compounds lying within safe and interpretable limits. This validation step provides confidence that the developed models can be reliably applied to external data sets and virtual screening campaigns, ensuring that predictions remain chemically meaningful and statistically valid.

William’s plot (standardized residuals vs leverage) for the Substructure Count-based Random Forest QSAR model. Most compounds lie within the acceptable boundaries (−3 < standardized residuals < +3; leverage < h* = 0.015), confirming the robustness and reliability of the model’s applicability domain, with only a few structural outliers detected.

3.7. Virtual Screening and Molecular Docking

The validated QSAR models were subsequently used to prioritize compounds from external chemical libraries, and the top-ranked candidates were subjected to molecular docking against the AKT1 kinase domain (PDB ID: 4GV1). Docking was performed using AutoDock Vina, and binding affinities were initially compared to the cocrystallized reference ligand to ensure consistency with the experimentally defined binding site. We further included Capivasertib (AZD5363) as an additional comparator because it represents a clinically advanced, mechanistically validated Akt inhibitor. This benchmarking step provides a stronger context for interpreting whether newly screened hits achieve interaction profiles and predicted affinities comparable to those of a known inhibitor, rather than relying solely on the crystallographic ligand (Figure ). The docking results (Table ) indicated that several screened compounds achieved stronger predicted binding affinities than the cocrystallized reference ligand (−7.09 kcal/mol). Among the top-ranked hits, NPC134413 (−9.42 kcal/mol) exhibited the most favorable docking score, followed closely by NPC277306 (−9.36 kcal/mol), NPC469442 (−9.07 kcal/mol), and NPC469444 (−9.05 kcal/mol). These NPC derivatives outperformed the crystallographic reference by approximately ∼2.0 kcal/mol, which is typically considered meaningful in structure-based ranking and suggests a higher likelihood of stable pocket complementarity. The ZINC-derived candidates also showed competitive performance: ZINC01275873 (−7.51 kcal/mol), ZINC13147288 (−7.32 kcal/mol), and ZINC72265942 (−7.27 kcal/mol) yielded affinities comparable to or slightly stronger than the reference ligand. Although their docking energies were modest relative to the NPC series, these ZINC scaffolds adopted distinct orientations and interaction patterns, offering useful chemical diversity for subsequent optimization.

Predicted binding modes of the cocrystallized reference ligand, the clinically relevant AKT inhibitor Capivasertib (AZD5363), and the top-ranked screened compounds within the ATP-binding site of AKT1. Ligands are shown as green sticks, while key stabilizing residues are displayed as labeled sticks. Hydrogen bonds and ionic interactions with critical catalytic and hinge-region residues, including Met281, Glu234, Asp292, and Lys158, are highlighted, illustrating conserved binding motifs and enabling direct comparison between screened hits and established AKT1 inhibitors.

3. Docking Scores and Interaction Details of Thereference Ligand and Top Virtual Screening Hits.

		Interaction Details
Compound ID and Name	Score	Ligand		Receptor			Interaction	Distance	E (kcal/mol)
4gv1-reference	–7.09	C	16	SD	MET	281	H-donor	3.77	–0.5
		N	18	OE2	GLU	234	H-donor	2.67	–17
		N	18	SD	MET	281	H-donor	3.58	–7.7
		C	19	SD	MET	281	H-donor	3.92	–0.6
		N	18	OE2	GLU	234	Ionic	2.67	–7.1
NPC134413	–9.42	C	4	SD	MET	227	H-donor	3.61	–0.5
		C	12	SD	MET	281	H-donor	4.13	–1
		N	13	OE2	GLU	234	H-donor	2.84	–15.7
		C	21	SD	MET	281	H-donor	3.72	–1.2
		C	21	OD2	ASP	292	H-donor	3.25	–1.3
		N	13	OE2	GLU	234	Ionic	2.84	–5.7

NPC277306	–9.36	N	6	OD2	ASP	292	H-donor	2.76	–6.4
		O	21	SD	MET	281	H-donor	3.85	–0.9
		O	1	NZ	LYS	179	H-acceptor	2.81	–8.9
		N	6	OD2	ASP	292	Ionic	2.76	–6.3
		6-ring		CG1	VAL	164	pi-H	4.13	–0.5
		6-ring		CG1	VAL	164	pi-H	4.14	–0.5
		6-ring		CG2	VAL	164	pi-H	4.26	–0.5
NPC469442	–9.07	N	3	SD	MET	281	H-donor	3.52	–1.5
		C	5	O	LEU	156	H-donor	3.14	–0.6
		N	6	OE2	GLU	234	H-donor	2.87	–4
		O	16	N	ALA	230	H-acceptor	3.02	–1.3
		N	6	OE1	GLU	234	Ionic	3.95	–0.6
		N	6	OE2	GLU	234	Ionic	2.87	–5.4
NPC469444	–9.05	CL	1	O	GLU	228	H-donor	3.75	–0.5
		C	10	SD	MET	281	H-donor	3.57	–0.5
		O	12	SD	MET	281	H-donor	3.69	–0.6
		6-ring		CB	PHE	161	pi-H	3.99	–0.5
		6-ring		CD2	PHE	161	pi-H	4.05	–0.8

ZINC01275873	–7.51	N	31	O	GLY	162	H-donor	3.29	–0.8
		O	41	O	GLU	278	H-donor	2.76	–2
		C	42	OD2	ASP	292	H-donor	3.21	–0.8
		C	44	SD	MET	281	H-donor	3.51	–1.2
		O	54	N	LYS	158	H-acceptor	3.51	–0.5
ZINC13147288	–7.32	O	49	OD1	ASP	292	H-donor	2.81	–2.9
		C	55	OE2	GLU	234	H-donor	3.09	–0.7
		O	61	OE2	GLU	234	H-donor	3.08	–1.3
		O	54	N	LYS	158	H-acceptor	2.96	–2.6
		N	34	OD1	ASP	274	Ionic	3.62	–1.5
		N	34	OD2	ASP	274	Ionic	3.11	–3.8
ZINC72265942	–7.27	O	25	OD1	ASN	279	H-donor	3.04	–0.5
		N	32	O	GLY	162	H-donor	3.27	–1.1
		C	43	OE2	GLU	234	H-donor	3.33	–1.2
		O	21	NZ	LYS	276	H-acceptor	2.84	–6.9
		O	46	N	ASP	439	H-acceptor	3.26	–1.4
Capivasertib (AZD5363)	–6.69	C	12	SD	MET	281	H-donor	3.77	–0.5
		N	16	OE2	GLU	198	H-donor	2.67	–17
		N	16	SD	MET	281	H-donor	3.58	–7.7
		C	14	SD	MET	281	H-donor	3.92	–0.6
		N	16	OE2	ALA	230	H-acceptor	2.67	–7.1

Open in a new tab

At the interaction level, the docked poses consistently localized within the ATP-binding pocket and reproduced canonical kinase-binding motifs. In particular, recurrent contacts were observed with residues that shape the catalytic cleft and recognition region, including Met281, Glu234, Asp292, and Lys158, supporting the plausibility of the predicted binding modes. NPC134413, the best-scoring ligand, established multiple hydrogen bonds involving Met227, Met281, and Glu234, together with an ionic interaction with Asp292, providing a structural explanation for its enhanced docking score. NPC277306 also engaged Asp292 and Lys179 through hydrogen bonding and was additionally anchored by π–H interactions with Val164, consistent with stable positioning within the catalytic groove. NPC469442 formed a dense interaction network involving Met281, Glu234, and Ala230, including ionic stabilization with Glu234, while NPC469444 combined hydrogen bonds to Met281/Glu234 with aromatic stabilization via π–π interactions with Phe161, indicating a balanced electrostatic and hydrophobic contribution to binding. In comparison, the ZINC-derived ligands displayed partially shifted contact patterns that still converged on key pocket determinants. ZINC01275873 formed hydrogen bonds with Lys276, Asn279, and Glu234 and further engaged Asp439 through ionic stabilization, suggesting productive exploitation of adjacent subpocket features. ZINC13147288 interacted with Asp292, Glu234, and Lys179 with additional ionic reinforcement, whereas ZINC72265942 engaged Asn279 and Glu234 while extending contacts toward Asp439, together forming a broader interaction footprint. The cocrystallized ligand predominantly contacted Met281 and Glu234, with hydrogen bond distances in the range of 2.6–3.9 Å, but showed weaker overall affinity than several screened hits. Importantly, inclusion of Capivasertib (AZD5363) provided an additional, clinically relevant benchmark, its binding pose occupied the same ATP pocket and exhibited the expected anchoring interactions within the catalytic cleft (Figure ). Observing that multiple screened candidates not only aligned with the crystallographic binding mode but also achieved predicted affinities and interaction richness comparable to a known inhibitor strengthens confidence that the virtual screening workflow is prioritizing pharmacologically meaningful chemotypes.

3.8. Protein–Ligands Dynamic Stability Analysis

To further validate the stability of the docked complexes, we performed 200 ns GaMD simulations and monitored the RMSD of protein–ligand complexes (Figure ). The RMSD trajectories provided valuable insights into conformational stability and the ability of each ligand to maintain a consistent binding orientation within the AKT1 active site. The 4gv1 (PDB code) protein displayed moderate fluctuations, stabilizing around 2.5–3.0 Å after ∼80 ns (Figure a), which served as a baseline for comparison. Among the screened compounds, NPC134413, NPC277306 and NPC469442 demonstrated remarkable stability, with RMSD values consistently below 2.0 Å throughout the simulation (Figure b–d), indicating tight binding and minimal conformational drift. NPC469444 showed slightly higher deviations (∼2.5 Å) but still maintained an overall stable trajectory (Figure e). The ZINC derivatives presented distinct stability trends. ZINC13147288 and ZINC72265942 both exhibited stable RMSD patterns (below 2.0 Å) comparable to the best NPC compounds, whereas ZINC01275873 showed a gradual increase in RMSD after ∼120 ns, reaching values near 3.0 Å, suggestive of partial rearrangements in binding mode (Figure f–h). The benchmark inhibitor Capivasertib (AZD5363) demonstrated intermediate stability, with RMSD values around 2.0–2.5 Å (Figure i). These results collectively highlight that several screened hits, particularly NPC134413, NPC277306, NPC469442, and ZINC72265942, are capable of maintaining stable interactions with AKT1 over extended time scales. Complementary ligand RMSD plots (Supporting Figure S3) further supported these findings. Ligand trajectories for NPC134413, NPC277306, NPC469442, ZINC13147288, and ZINC72265942 remained stable (RMSD ∼1–3 Å), reinforcing their ability to sustain consistent binding poses. In contrast, ZINC01275873 exhibited higher ligand RMSD values (∼6–8 Å after 120 ns), correlating with its protein RMSD deviations and suggesting reduced conformational stability. Together, the complex and ligand RMSD analyses confirmed that multiple hits demonstrated superior stability compared to the reference ligand, strengthening confidence in their potential as lead candidates for AKT1 inhibition.

Time evolution of backbone RMSD for AKT1 complexes during 200 ns GaMD simulations, including the reference structure (a), top-ranked NPC (b–e) and ZINC hits (f–h), and the benchmark inhibitor Capivasertib (AZD5363) (i). Lower and more stable RMSD profiles indicate enhanced structural stability and sustained binding of prioritized ligands within the AKT1 active site.

3.9. Residue Flexibility and Secondary Structure Analysis

To gain further insights into the dynamic behavior of AKT1–ligand complexes, we analyzed the root-mean-square fluctuation (RMSF) of Cα atoms across the 200 ns GaMD trajectories (Figure ). Overall, RMSF profiles indicated stable backbone behavior for most complexes, with average fluctuations primarily within the 0.5–2.0 Å range. Localized peaks were observed in flexible loop regions, particularly spanning residues 262–302 and 392–442, corresponding to surface-exposed loops and segments proximal to the activation region. The reference complex (4gv1) exhibited moderate fluctuations in the C-terminal region (residues 392–442) (Figure a), reflecting the intrinsic flexibility of the kinase domain in the absence of strong stabilizing interactions. In contrast, complexes bound to NPC134413 (Figure b), NPC277306 (Figure c), and NPC469442 (Figure d) displayed reduced RMSF amplitudes across these regions, indicating effective stabilization of the activation loop and catalytic cleft. NPC469444 (Figure e), despite favorable docking and MM-GBSA scores, displayed slightly elevated fluctuations in distal loop regions, suggesting localized flexibility without global destabilization. These ligands suppressed excessive backbone mobility, particularly around residues involved in ATP binding and catalysis. Among the ZINC-derived compounds, ZINC13147288 (Figure f) and ZINC72265942 (Figure g) showed moderate stabilization of backbone fluctuations, whereas ZINC01275873 (Figure h) exhibited higher flexibility in the 412–442 residue region. This increased mobility is consistent with its less persistent interaction network and relatively higher ligand RMSD observed in Figure S3. Importantly, the clinically validated AKT inhibitor Capivasertib (AZD5363) (Figure i) exhibited an RMSF profile closely resembling those of the top-performing NPC ligands. Fluctuations remained consistently low across the kinase core, with only minor peaks in terminal loop regions, supporting its role as a stabilizing reference compound and validating the simulation framework used in this study. Secondary structure mapping (Figure j) further confirmed that, despite residue-level fluctuations, the overall α-helical and β-strand architecture of AKT1 remained conserved across all systems. Notably, NPC134413, NPC277306, and Capivasertib preserved the integrity of key α-helical segments within the catalytic domain, indicating that their binding restricts excessive conformational plasticity while maintaining structural order.

Root mean square fluctuation (RMSF) profiles of AKT1 Cα atoms over 200 ns GaMD simulations for the reference complex (a), top-ranked NPC (b–e) and ZINC ligands (f–h), and the benchmark inhibitor Capivasertib (AZD5363) (i). Lower fluctuations in key catalytic and activation-loop regions indicate enhanced stabilization of AKT1 upon binding of prioritized inhibitors (j).

3.10. Compactness of Protein–Ligand Complexes

The RoG serves as an important metric for monitoring the compactness of protein–ligand complexes, providing insight into whether ligand binding maintains the structural integrity of the protein fold or induces conformational loosening. For AKT1, a kinase with a relatively stable catalytic domain, stable RoG values indicate that ligand association does not compromise the overall fold, while larger fluctuations can point to destabilization or partial unfolding. The reference AKT1 complex (4gv1) displayed RoG values fluctuating between ∼ 20.5 and 21.6 Å, consistent with its crystallographic stability and serving as a baseline for comparison (Figure a). NPC134413, NPC277306, and NPC469442 maintained remarkably stable RoG profiles throughout the 200 ns simulations (Figure b–d), with average values clustered between 20.6 and 21.2 Å. The absence of significant upward drifts suggests that these ligands enhance the compactness of AKT1 and reinforce intramolecular contacts, likely through persistent interactions with key catalytic residues such as Met281, Glu234, and Asp292. This stabilization was particularly pronounced for NPC134413, which exhibited minimal RoG fluctuations, indicating a strong anchoring effect within the ATP-binding site. NPC469444, despite favorable docking and MM-GBSA energetics, showed a gradual increase in RoG after ∼120 ns, reaching ∼21.6 Å (Figure e). This behavior suggests localized flexibility within the kinase structure, potentially reflecting transient ligand repositioning or modulation of secondary structural elements near the binding pocket, while not inducing global destabilization. The ZINC-derived ligands demonstrated more heterogeneous RoG behaviors. ZINC01275873 exhibited a progressive increase in RoG, stabilizing near ∼21.7 Å (Figure f), indicative of moderate loosening of the tertiary structure, possibly due to its alternative interaction pattern involving residues such as Asp439 and Lys276 rather than the canonical hinge region. In contrast, ZINC13147288 and ZINC72265942 maintained relatively stable RoG values (∼20.6–21.3 Å), closely resembling both the reference and Capivasertib complexes (Figure g–i). Importantly, the clinically validated AKT inhibitor Capivasertib (AZD5363) displayed a RoG profile comparable to the top-performing NPC ligands, remaining stable within the ∼20.7–21.3 Å range over the full simulation. This close agreement with NPC134413 and NPC469442 supports the reliability of the simulation protocol and confirms that the newly identified compounds achieve a level of global structural stabilization similar to a benchmark AKT inhibitor.

Radius of gyration (RoG) profiles of AKT1–ligand complexes over 200 ns MD simulations for the reference complex (a), top-ranked NPC (b–e) and ZINC ligands (f–h), illustrating global compactness and structural stability. NPC134413, NPC277306, NPC469442, and Capivasertib (i) maintain stable RoG values comparable to or lower than the reference, indicating preserved protein fold and enhanced stabilization upon ligand binding.

3.11. Hydrogen Bonding Stability Analysis

The stability of hydrogen bonding and other noncovalent interactions during molecular dynamics simulations provides critical insight into the persistence of ligand binding and the likelihood of sustained inhibitory activity. The interaction profiles for the top-ranked ligands demonstrated clear differences in the frequency and duration of contacts with key AKT1 residues. The 4gv1 exhibited only transient interactions, mainly with Asp439, Lys159, and Ala231 (Figure a), which were sporadic and short-lived. This finding is consistent with its weaker docking affinity (−7.09 kcal/mol) and moderate stabilization observed in RMSD and RoG analyses. In contrast, NPC134413 maintained continuous hydrogen bonding with residues such as Lys277, Va265, Gly160, and Asp439 throughout the 200 ns trajectory (Figure b). The persistence of these contacts, particularly with catalytic residues Lys277 and Asp439, underpins its superior docking score (−9.42 kcal/mol) and exceptional stability during MD. Similarly, NPC277306 exhibited sustained interactions with Gly163, Val164, Lys180, and Thr161 (Figure c), highlighting its ability to anchor across both hydrophilic and hydrophobic hotspots in the ATP-binding pocket. NPC469442 and NPC469444 also showed robust interaction patterns, with residues such as Glu335, Lys277, Thr161, and Asn280 forming hydrogen bonds that were consistently maintained across the simulation window (Figure d–e). These stable contact networks explain the minimal RMSD deviations and compact RoG profiles observed for these complexes. The ZINC-based ligands demonstrated comparatively fewer persistent contacts but still displayed interaction stability. ZINC01275873 engaged residues Lys180, Ala231, and Asp293 consistently, albeit with fewer simultaneous contacts than the NPC ligands (Figure f). ZINC13147288 and ZINC72265942 showed less extensive but reproducible interactions with Asp439, Glu335, and Lys180 (Figure h,g), reflecting alternative anchoring strategies. While their binding networks were less robust, the presence of persistent hydrogen bonds supports their role as secondary scaffolds for inhibitor optimization. For comparison, the clinically relevant AKT inhibitor Capivasertib (AZD5363) displayed more persistent interactions than the cocrystallized ligand, particularly with hinge-region and catalytic residues, including Lys179, ALA230 and GLU234 (Figure i). However, these contacts showed intermittent disruption over the trajectory, suggesting moderate dynamic stability relative to the top-ranked screened compounds. This behavior is in agreement with its role as a reversible ATP-competitive inhibitor and provides a meaningful benchmark for evaluating the interaction persistence of newly identified hits. Overall, Figure highlights a clear correlation between docking affinity, MD-derived stability, and the persistence of intermolecular interactions. NPC134413, NPC277306, NPC469442, and NPC469444 demonstrated the strongest and most continuous interaction patterns, reinforcing their prioritization as lead candidates. ZINC-based compounds, although displaying weaker networks, contributed structural diversity and additional binding strategies that could be exploited for scaffold optimization in future drug design efforts.

Time-resolved interaction analysis of AKT1–ligand complexes from 200 ns GaMD simulations, showing the persistence of hydrogen bonds and key noncovalent contacts for the reference ligand (a), top-ranked compounds (b–h) and Capivasertib (AZD5363) (i). Continuous interactions with catalytic and hinge-region residues indicate enhanced binding stability of NPC-derived ligands compared with the reference and Capivasertib.

3.12. PCA of Conformational Dynamics

To further elucidate ligand-induced conformational behavior, PCA was performed on the 200 ns molecular dynamics trajectories of AKT1 complexes, including the cocrystallized reference, Capivasertib (AZD5363), and the top-ranked screened compounds (Figure ). Projection onto the first two principal components (PC1 and PC2) revealed distinct dynamic signatures across the systems. The reference complex (4gv1) sampled a broad conformational space characterized by multiple overlapping clusters, indicating higher structural flexibility and less constrained dynamics. In contrast, NPC134413 and NPC277306 exhibited compact and well-defined clusters, reflecting restricted conformational sampling and enhanced stabilization of the AKT1 kinase domain. NPC469442 and NPC469444 also showed relatively confined distributions with limited transitions between conformational states, consistent with their stable RMSD profiles and persistent interaction networks. Capivasertib (AZD5363), included as a clinically relevant benchmark, displayed intermediate behavior, with broader sampling than NPC ligands but more restricted dynamics than the reference complex. This observation aligns with its known efficacy yet moderate conformational stabilization compared to the top QSAR-prioritized hits. Among the ZINC-derived compounds, ZINC01275873 and ZINC72265942 sampled wider conformational regions, indicative of greater flexibility and less constrained binding, whereas ZINC13147288 demonstrated comparatively tighter clustering, suggesting intermediate stability. Overall, PCA results corroborate docking and MM-GBSA findings, highlighting NPC134413, NPC277306, and NPC469442 as the most effective ligands in stabilizing AKT1 by suppressing large-scale conformational fluctuations, while Capivasertib and ZINC scaffolds provide valuable comparative and structural diversity insights.

PCA of AKT1 conformational dynamics during 200 ns MD simulations, projected onto the first two principal components (PC1 and PC2) for the reference complex, Capivasertib (AZD5363), and top-ranked ligands. Compact clusters indicate restricted motions and enhanced stabilization (NPC ligands), whereas broader distributions reflect increased flexibility and alternative binding dynamics.

3.13. Essential Dynamics and Collective Motions

To further characterize ligand-induced collective motions, essential dynamics analysis was performed based on the principal eigenvectors of Cα atomic fluctuations (Figure ). The porcupine plots reveal that the reference complex (4gv1) undergoes broad, less coordinated motions across the kinase domain, consistent with its higher RMSD and flexible loop regions observed during MD simulations. In contrast, NPC134413 and NPC277306 induced markedly restricted and localized motions, with dominant fluctuations confined to loop regions surrounding the ATP-binding pocket, indicating effective suppression of large-scale conformational drift. NPC469442 and NPC469444 similarly reduced global motions, particularly within the activation loop and adjacent helices, supporting their stable binding modes and favorable RoG profiles. The ZINC-derived ligands exhibited more heterogeneous behavior. ZINC01275873 and ZINC72265942 showed broader collective motions, reflecting increased flexibility, whereas ZINC13147288 displayed moderate restriction, consistent with its intermediate dynamic stability. Capivasertib demonstrated restrained motions relative to the reference complex but remained more flexible than the top NPC candidates, serving as a clinically relevant benchmark. Collectively, essential dynamics analysis confirms that the top-performing ligands effectively dampen large-scale collective motions, complementing PCA and interaction persistence results and reinforcing their potential for sustained AKT1 inhibition.

Essential dynamics analysis of AKT1–ligand complexes showing dominant collective motions derived from Cα atom fluctuations. Porcupine plots illustrate that top NPC ligands and Capivasertib restrict large-scale motions compared to the reference, consistent with enhanced conformational stability and sustained inhibitory binding.

3.14. Binding Free Energy Calculations

To further validates the binding stability and quantify the energetics of ligand–protein interactions, MM-GBSA calculations were performed on equilibrated trajectories of the AKT1–ligand complexes. The total binding free energy (ΔG_TOTAL) was decomposed into van der Waals (ΔE_VDW), electrostatic (ΔE_EL), polar solvation (ΔE_GB), and nonpolar solvation (ΔE_SASA) components (Table ). Among the tested ligands, NPC469442 exhibited the most favorable binding free energy (−48.54 ± 3.49 kcal/mol), surpassing the cocrystallized reference ligand (4gv1, −44.20 ± 0.06 kcal/mol). This enhanced affinity was primarily driven by strong hydrophobic interactions (ΔE_VDW = −67.98 ± 3.57 kcal/mol) and favorable electrostatics (ΔE_EL = −43.49 ± 3.99 kcal/mol), partially offset by polar solvation penalties. Similarly, NPC134413 (−45.85 ± 3.63 kcal/mol) and NPC277306 (−43.69 ± 4.99 kcal/mol) displayed highly favorable binding energies, consistent with their strong docking scores and sustained stability during molecular dynamics simulations. Per-residue MM-GBSA decomposition analysis provided additional insight into the molecular determinants of ligand stabilization within the protein-binding pocket. Across the top-ranked complexes, the hinge residue Met281 consistently emerged as the dominant contributor, providing stabilizing contributions in the range of −2.5 to −4.0 kcal/mol, primarily through backbone hydrogen bonding and hydrophobic packing. Glu234, another key hinge-region residue, contributed −1.8 to −3.2 kcal/mol, reflecting persistent electrostatic and hydrogen-bond interactions observed throughout the simulations. The catalytic residue Asp292 also played a significant role, contributing approximately −1.5 to −3.0 kcal/mol, particularly in the NPC134413 and NPC277306 complexes. Positively charged residues such as Lys277/Lys180 contributed −1.0 to −2.2 kcal/mol, stabilizing the ligands through ionic and long-range electrostatic interactions. Additional hydrophobic residues lining the ATP pocket, including Ala230 and Val164, provided moderate but consistent van der Waals stabilization (approximately −0.6 to −1.5 kcal/mol). In contrast, NPC469444 showed a less favorable total binding free energy (−28.88 ± 7.21 kcal/mol), despite maintaining interactions with key residues such as Met281 and Glu234, due to higher solvation penalties. The ZINC-derived ligands exhibited comparatively weaker binding free energies (−27.06 to −34.37 kcal/mol), with ZINC01275873 performing best within this group, supported mainly by van der Waals interactions. Nevertheless, these compounds contributed valuable scaffold diversity and distinct interaction patterns that may be advantageous for future optimization. Overall, compared to the known AKT1 inhibitor Capivasertib (AZD5363) with a free energy of −43.1006 ± 3.02 kcal/mol, the combined MM-GBSA energy decomposition and residue-level analysis corroborate the docking and molecular dynamics results, highlighting NPC469442, NPC134413, and NPC277306 as the most promising AKT1 inhibitors identified in this study.

4. Binding Free Energy Components (kcal/mol) from MM-GBSA Calculations.

Complex	ΔE_VDW	ΔE_EL	ΔE_GB	ΔE_SASA	ΔG_TOTAL
NPC134413	–53.37 ± 3.74	–7.66 ± 3.17	21.21 ± 3.28	–6.03 ± 0.32	–45.85 ± 3.63
NPC277306	–65.38 ± 5.21	–37.27 ± 5.12	67.72 ± 4.50	–8.76 ± 0.63	–43.69 ± 4.99
NPC469442	–67.98 ± 3.57	–43.49 ± 3.99	71.59 ± 3.55	–8.66 ± 0.39	–48.54 ± 3.49
NPC469444	–49.52 ± 9.02	–14.38 ± 9.89	41.42 ± 9.73	–6.39 ± 1.23	–28.88 ± 7.21
ZINC01275873	–38.88 ± 2.53	–2.17 ± 2.46	11.80 ± 2.41	–5.12 ± 0.22	–34.37 ± 2.69
ZINC13147288	–34.38 ± 2.20	–8.76 ± 2.76	18.47 ± 2.60	–3.92 ± 0.26	–28.59 ± 2.07
ZINC72265942	–30.15 ± 2.72	–5.66 ± 2.24	12.95 ± 1.92	–4.20 ± 0.30	–27.06 ± 2.49
4gv1	–47.0 ± 0.06	–4.7 ± 0.03	13.1 ± 0.03	–5.6 ± 0.005	–44.2 ± 0.06
Capivasertib (AZD5363)	–50.1356 ± 3.23	–3.9736 ± 1.99	16.9943 ± 1.62	–5.9857 ± 0.18	–43.1006 ± 3.02

Open in a new tab

4. Conclusions

In this study, we employed an integrated QSAR-guided virtual screening and GaMD framework to identify novel AKT1 inhibitors with potential therapeutic relevance for obesity and MASLD. The curated data set (n = 2711) supported the development of reliable QSAR models, which effectively guided the prioritization of structurally active scaffolds in subsequent virtual screening. Subsequent molecular docking and dynamic simulations revealed that several candidates, particularly NPC134413, NPC277306, and NPC469442, exhibited stronger binding affinities than the reference ligand and formed stable interactions with key catalytic residues of AKT1. Enhanced sampling GaMD simulations, supported by MM-GBSA binding free energy calculations, further confirmed the structural stability and favorable energetic profiles of these complexes. Despite these encouraging computational findings, it is important to acknowledge the inherent limitations of purely in-silico approaches. While QSAR modeling, docking, and enhanced molecular dynamics simulations provide valuable tools for compound prioritization and mechanistic insight, they cannot fully capture the biological complexity of metabolic diseases, where tissue-specific signaling, metabolic pathway interactions, and cellular context play critical roles. Consequently, experimental validation using in vitro kinase assays, cell-based functional studies, and metabolic pathway analyses will be essential to confirm the inhibitory activity, selectivity, and therapeutic relevance of the identified compounds. Overall, this study presents a comprehensive computational framework for AKT1 inhibitor discovery and provides a rational foundation for future experimental investigations aimed at developing targeted therapies for metabolic disorders such as obesity and MASLD.

5. Limitations

While the present study highlights the utility of an integrated QSAR-guided virtual screening and GaMD-based molecular dynamics framework, several limitations should be acknowledged. First, although the bioactivity data set was carefully curated and standardized, residual variability in experimental IC₅₀ values may persist due to differences in assay formats and experimental conditions across source studies. Such heterogeneity can introduce uncertainty into QSAR model training despite rigorous filtering and validation procedures. Second, the QSAR models were primarily constructed using binary and count-based molecular fingerprints rather than continuous physicochemical or 3D descriptors, which may limit the capture of more subtle electronic, conformational, or solvation effects that influence kinase inhibition. Third, and most importantly, the present study is entirely computational and does not include direct biochemical or cellular validation. No in vitro AKT1 kinase inhibition assays or pathway-level analyses (e.g., p-AKT/p-mTOR modulation) were performed. Therefore, the identified compounds should be considered computationally prioritized candidates rather than experimentally confirmed inhibitors. In addition, molecular docking, enhanced molecular dynamics simulations, and MM/GBSA free energy calculations are predictive and comparative in nature. These approaches do not fully account for biological complexity, including isoform selectivity, cellular permeability, metabolic stability, off-target effects, and in vivo pharmacokinetics. MM/GBSA calculations are approximate and do not explicitly incorporate entropic contributions or long-time scale solvent effects, which may affect quantitative accuracy. Furthermore, this study focused exclusively on AKT1, whereas AKT2 and AKT3 also play important roles in metabolic regulation and may influence therapeutic outcomes in obesity and MASLD. Accordingly, experimental validation represents a critical next step. Planned follow-up studies include in vitro AKT1 kinase IC₅₀ assays, cellular pathway engagement assays (e.g., p-AKT and p-mTOR analysis), isoform selectivity profiling, and preliminary ADME/Tox evaluations. These investigations will be essential to translate the present computational findings into biologically and therapeutically meaningful outcomes.

Supplementary Material

ao5c11688_si_001.pdf^{(358.1KB, pdf)}

Acknowledgments

This work was supported by Guangdong Medical University Undergraduate Innovation and Entrepreneurship Education Base Project (JDXM2025027 and JDXM2025101), Guangdong Basic and Applied Basic Research Foundation (2024A1515012465 and 2024A1515012922), Key project of universities in Guangdong Province (2022ZDZX2023), Discipline Construction Project of Guangdong Medical University (4SG21008G, 4SG24003G), Medical Scientific Research Foundation of Guangdong Province (B2024043), Clinical & Basic Technology Innovation Special Program of Guangdong Medical University (GDMULCJC2024155, GDMUZDCG25016), Talent Development Foundation of the First Dongguan Affiliated Hospital of Guangdong Medical University (GCC2022007), Talent Development Foundation of the First Dongguan Affiliated Hospital of Guangdong Medical University& Foundation of State Key Laboratory of Pathogenesis, and Prevention and Treatment of High Incidence Diseases in Central Asia (SKL-HIDCA-2024-GD8)

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acsomega.5c11688.

Heatmap of Random Forest model performance across fingerprints (Figure S1), learning curves: training/validation R ² vs sample size (Figure S2), and ligand RMSD profiles from 200 ns GaMD simulations (Figure S3) (PDF)

⊥.

K.C. and R.W. contributed equally to this work. K.C. and R.W.: Conceptualization, data curation, formal analysis, methodology, software, writingoriginal draft. D.O., S.W., and Y.C.: Data curation, formal analysis, investigation, methodology, software, writingoriginal draft, writingreview and editing. L.L. and X.L.: Funding acquisition, project administration, supervision.

The authors declare no competing financial interest.

References

Younossi Z. M.. Non-alcoholic fatty liver disease–a global public health perspective. J. Hepatol. 2019;70(3):531–544. doi: 10.1016/j.jhep.2018.10.033. [DOI] [PubMed] [Google Scholar]
Henry L., Paik J., Younossi Z. M.. the epidemiologic burden of non-alcoholic fatty liver disease across the world. Aliment. Pharmacol. Ther. 2022;56(6):942–956. doi: 10.1111/apt.17158. [DOI] [PubMed] [Google Scholar]
Sørensen, T. I. ; Martinez, A. R. ; Jørgensen, T. S. H. . Epidemiology of Obesity InFrom Obesity to Diabetes; Springer, 2022; pp 3–27. [Google Scholar]
Gan C., Yuan Y., Shen H.. et al. Liver diseases: epidemiology, causes, trends and predictions. Signal Transduction Targeted Ther. 2025;10(1):33. doi: 10.1038/s41392-024-02072-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Batool S., Cuthrel K. M., Tzenios N.. et al. Hepatocellular Carcinoma in Non-alcoholic Fatty Liver Disease: Emerging Burden. Int. Res. J. Oncol. 2022;6(4):93–104. [Google Scholar]
Godoy-Matos A. F., Silva Júnior W. S., Valerio C. M.. NAFLD as a continuum: from obesity to metabolic syndrome and diabetes. Diabetol. Metab. Syndr. 2020;12(1):60. doi: 10.1186/s13098-020-00570-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang C., Liu S., Yang M.. Hepatocellular carcinoma and obesity, type 2 diabetes mellitus, cardiovascular disease: causing factors, molecular links, and treatment options. Front. Endocrinol. 2021;12:808526. doi: 10.3389/fendo.2021.808526. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lim S., Kim J.-W., Targher G.. Links between metabolic syndrome and metabolic dysfunction-associated fatty liver disease. Trends Endocrinol. Metab. 2021;32(7):500–514. doi: 10.1016/j.tem.2021.04.008. [DOI] [PubMed] [Google Scholar]
Friedenreich C. M., Ryder-Burbidge C., McNeil J.. Physical activity, obesity and sedentary behavior in cancer etiology: epidemiologic evidence and biologic mechanisms. Mol. Oncol. 2021;15(3):790–800. doi: 10.1002/1878-0261.12772. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hotamisligil G. S.. Inflammation, metaflammation and immunometabolic disorders. Nature. 2017;542(7640):177–185. doi: 10.1038/nature21363. [DOI] [PubMed] [Google Scholar]
Hoxhaj G., Manning B. D.. The PI3K–AKT network at the interface of oncogenic signalling and cancer metabolism. Nat. Rev. Cancer. 2020;20(2):74–88. doi: 10.1038/s41568-019-0216-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Robles-Flores M., Moreno-Londoño A. P., Castañeda-Patlán M. C.. Signaling pathways involved in nutrient sensing control in cancer stem cells: an overview. Front. Endocrinol. 2021;12:627745. doi: 10.3389/fendo.2021.627745. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yuan H.-X., Xiong Y., Guan K.-L.. Nutrient sensing, metabolism, and cell growth control. Mol. Cell. 2013;49(3):379–387. doi: 10.1016/j.molcel.2013.01.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
Manning B. D., Toker A.. AKT/PKB signaling: navigating the network. Cell. 2017;169(3):381–405. doi: 10.1016/j.cell.2017.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
Manning B. D., Cantley L. C.. AKT/PKB signaling: navigating downstream. Cell. 2007;129(7):1261–1274. doi: 10.1016/j.cell.2007.06.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
You M., Xie Z., Zhang N.. et al. Signaling pathways in cancer metabolism: mechanisms and therapeutic targets. Signal Transduction Targeted Ther. 2023;8(1):196. doi: 10.1038/s41392-023-01442-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sayem A. S. M., Arya A., Karimian H.. et al. Action of phytochemicals on insulin signaling pathways accelerating glucose transporter (GLUT4) protein translocation. Molecules. 2018;23(2):258. doi: 10.3390/molecules23020258. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bo T., Gao L., Yao Z.. et al. Hepatic selective insulin resistance at the intersection of insulin signaling and metabolic dysfunction-associated steatotic liver disease. Cell Metab. 2024;36(5):947–968. doi: 10.1016/j.cmet.2024.04.006. [DOI] [PubMed] [Google Scholar]
Lieber C. S.. Alcoholic fatty liver: its pathogenesis and mechanism of progression to inflammation and fibrosis. Alcohol. 2004;34(1):9–19. doi: 10.1016/j.alcohol.2004.07.008. [DOI] [PubMed] [Google Scholar]
Engin, A. Nonalcoholic Fatty Liver Disease and Staging of Hepatic Fibrosis. In Obesity and Lipotoxicity, Advances in Experimental Medicine and Biology; Springer, 2024; pp 539–574. [DOI] [PubMed] [Google Scholar]
Shao C., Xu Y.. PI3K/AKT signaling pathway plays an important role in the pathogenesis of metabolic dysfunction-associated steatotic liver disease. Sci. Rep. 2025;15(1):20593. doi: 10.1038/s41598-025-07612-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Garcia-Echeverria C., Sellers W.. Drug discovery approaches targeting the PI3K/Akt pathway in cancer. Oncogene. 2008;27(41):5511–5526. doi: 10.1038/onc.2008.246. [DOI] [PubMed] [Google Scholar]
Lin X., Li X., Lin X.. A review on applications of computational methods in drug screening and design. Molecules. 2020;25(6):1375. doi: 10.3390/molecules25061375. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rao V. S., Srinivas K.. Modern drug discovery process: An in silico approach. J. Bioinf. Sequence Anal. 2011;2(5):89–94. [Google Scholar]
Zdrazil B.. Fifteen years of ChEMBL and its role in cheminformatics and drug discovery. J. Cheminf. 2025;17(1):1–9. doi: 10.1186/s13321-025-00963-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Alcázar J. J., Sánchez I., Merino C.. et al. A Simple Machine Learning-Based Quantitative Structure–Activity Relationship Model for Predicting pIC50 Inhibition Values of FLT3 Tyrosine Kinase. Pharmaceuticals. 2025;18(1):96. doi: 10.3390/ph18010096. [DOI] [PMC free article] [PubMed] [Google Scholar]
Murrell D. S., Cortes-Ciriano I.. et al. Chemically Aware Model Builder (camb): an R package for property and bioactivity modelling of small molecules. J. Cheminf. 2015;7(1):45. doi: 10.1186/s13321-015-0086-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mitra S., Chatterjee S., Bose S.. et al. Finding structural requirements of structurally diverse α-glucosidase and α-amylase inhibitors through validated and predictive 2D-QSAR and 3D-QSAR analyses. J. Mol. Graphics Modell. 2024;126:108640. doi: 10.1016/j.jmgm.2023.108640. [DOI] [PubMed] [Google Scholar]
Nguyen H. D., Kim M.-S.. Identification of promising inhibitory heterocyclic compounds against acetylcholinesterase using QSAR, ADMET, biological activity, and molecular docking. Comput. Biol. Chem. 2023;104:107872. doi: 10.1016/j.compbiolchem.2023.107872. [DOI] [PubMed] [Google Scholar]
Chen J., Wang X., Lei F.. Data-driven multinomial random forest: a new random forest variant with strong consistency. J. Big Data. 2024;11(1):34. doi: 10.1186/s40537-023-00874-6. [DOI] [Google Scholar]
Zhou Y., Li S., Zhao Y.. et al. Quantitative structure–activity relationship (QSAR) model for the severity prediction of drug-induced rhabdomyolysis by using random forest. Chem. Res. Toxicol. 2021;34(2):514–521. doi: 10.1021/acs.chemrestox.0c00347. [DOI] [PubMed] [Google Scholar]
Wu Y., Huo D., Chen G.. et al. SAR and QSAR research on tyrosinase inhibitors using machine learning methods. SAR QSAR Environ. Res. 2021;32(2):85–110. doi: 10.1080/1062936X.2020.1862297. [DOI] [PubMed] [Google Scholar]
Rodríguez-Pérez R., Bajorath J.. Evaluation of multi-target deep neural network models for compound potency prediction under increasingly challenging test conditions. J. Comput.-Aided Mol. Des. 2021;35(3):285–295. doi: 10.1007/s10822-021-00376-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Belete D. M., Huchaiah M. D.. Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results. Int. J. Comput. Appl. 2022;44(9):875–886. doi: 10.1080/1206212X.2021.1974663. [DOI] [Google Scholar]
Dhilsath F. M., Samuel S. J.. Hyperparameter tuning of ensemble classifiers using grid search and random search for prediction of heart disease. Comput. Intell. Healthcare Inf. 2021:139–158. doi: 10.1002/9781119818717.ch8. [DOI] [Google Scholar]
Roy, K. ; Kar, S. ; Das, R. N. . Das, Statistical Methods in QSAR/QSPR, in A Primer on QSAR/QSPR Modeling: Fundamental Concepts; Springer, 2015; pp 37–59. [Google Scholar]
Pettersen E. F., Goddard T. D., Huang C. C.. et al. UCSF Chimeraa visualization system for exploratory research and analysis. J. Comput. Chem. 2004;25(13):1605–1612. doi: 10.1002/jcc.20084. [DOI] [PubMed] [Google Scholar]
Rasmussen M. H., Strandgaard M., Seumer J.. et al. SMILES all around: structure to SMILES conversion for transition metal complexes. J. Cheminf. 2025;17(1):1–13. doi: 10.1186/s13321-025-01008-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Guo F., Yang H., Li S.. et al. Using Gaussian accelerated molecular dynamics combined with Markov state models to explore the mechanism of action of new oral inhibitors on Complex I. Comput. Biol Med. 2024;177:108598. doi: 10.1016/j.compbiomed.2024.108598. [DOI] [PubMed] [Google Scholar]
He X., Man V. H., Yang W.. et al. A fast and high-quality charge model for the next generation general AMBER force field. J. Chem. Phys. 2020;153(11):114502. doi: 10.1063/5.0019056. [DOI] [PMC free article] [PubMed] [Google Scholar]
Huggins D. J.. Comparing the performance of different AMBER protein forcefields, partial charge assignments, and water models for absolute binding free energy calculations. J. Chem. Theory Comput. 2022;18(4):2616–2630. doi: 10.1021/acs.jctc.1c01208. [DOI] [PubMed] [Google Scholar]
Miao Y., Feher V. A., McCammon J. A.. Gaussian accelerated molecular dynamics: unconstrained enhanced sampling and free energy calculation. J. Chem. Theory Comput. 2015;11(8):3584–3595. doi: 10.1021/acs.jctc.5b00436. [DOI] [PMC free article] [PubMed] [Google Scholar]
Salomon-Ferrer R., Götz A. W., Poole D.. et al. Routine microsecond molecular dynamics simulations with AMBER on GPUs. 2. Explicit solvent particle mesh Ewald. J. Chem. Theory Comput. 2013;9(9):3878–3888. doi: 10.1021/ct400314y. [DOI] [PubMed] [Google Scholar]
Genheden S., Ryde U., Söderhjelm P.. Binding affinities by alchemical perturbation using QM/MM with a large QM system and polarizable MM model. J. Comput. Chem. 2015;36(28):2114–2124. doi: 10.1002/jcc.24048. [DOI] [PubMed] [Google Scholar]
Shanak S., Bassalat N., Barghash A.. et al. Drug discovery of plausible lead natural compounds that target the insulin signaling pathway: Bioinformatics approaches. Evidence-Based Complementary Altern. Med. 2022;2022(1):2832889. doi: 10.1155/2022/2832889. [DOI] [PMC free article] [PubMed] [Google Scholar]
Radziszewski M., Galus R., Łuszczyński K.. et al. The RAGE Pathway in Skin Pathology Development: A Comprehensive Review of Its Role and Therapeutic Potential. Int. J. Mol. Sci. 2024;25(24):13570. doi: 10.3390/ijms252413570. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sarkar, B. et al. Fundamental Approaches of Drug Discovery, in Biochemical and Molecular Pharmacology in Drug Discovery; Elsevier, 2024; pp 251–282. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ao5c11688_si_001.pdf^{(358.1KB, pdf)}

[ref1] Younossi Z. M.. Non-alcoholic fatty liver disease–a global public health perspective. J. Hepatol. 2019;70(3):531–544. doi: 10.1016/j.jhep.2018.10.033. [DOI] [PubMed] [Google Scholar]

[ref2] Henry L., Paik J., Younossi Z. M.. the epidemiologic burden of non-alcoholic fatty liver disease across the world. Aliment. Pharmacol. Ther. 2022;56(6):942–956. doi: 10.1111/apt.17158. [DOI] [PubMed] [Google Scholar]

[ref3] Sørensen, T. I. ; Martinez, A. R. ; Jørgensen, T. S. H. . Epidemiology of Obesity InFrom Obesity to Diabetes; Springer, 2022; pp 3–27. [Google Scholar]

[ref4] Gan C., Yuan Y., Shen H.. et al. Liver diseases: epidemiology, causes, trends and predictions. Signal Transduction Targeted Ther. 2025;10(1):33. doi: 10.1038/s41392-024-02072-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref5] Batool S., Cuthrel K. M., Tzenios N.. et al. Hepatocellular Carcinoma in Non-alcoholic Fatty Liver Disease: Emerging Burden. Int. Res. J. Oncol. 2022;6(4):93–104. [Google Scholar]

[ref6] Godoy-Matos A. F., Silva Júnior W. S., Valerio C. M.. NAFLD as a continuum: from obesity to metabolic syndrome and diabetes. Diabetol. Metab. Syndr. 2020;12(1):60. doi: 10.1186/s13098-020-00570-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref7] Zhang C., Liu S., Yang M.. Hepatocellular carcinoma and obesity, type 2 diabetes mellitus, cardiovascular disease: causing factors, molecular links, and treatment options. Front. Endocrinol. 2021;12:808526. doi: 10.3389/fendo.2021.808526. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref8] Lim S., Kim J.-W., Targher G.. Links between metabolic syndrome and metabolic dysfunction-associated fatty liver disease. Trends Endocrinol. Metab. 2021;32(7):500–514. doi: 10.1016/j.tem.2021.04.008. [DOI] [PubMed] [Google Scholar]

[ref9] Friedenreich C. M., Ryder-Burbidge C., McNeil J.. Physical activity, obesity and sedentary behavior in cancer etiology: epidemiologic evidence and biologic mechanisms. Mol. Oncol. 2021;15(3):790–800. doi: 10.1002/1878-0261.12772. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref10] Hotamisligil G. S.. Inflammation, metaflammation and immunometabolic disorders. Nature. 2017;542(7640):177–185. doi: 10.1038/nature21363. [DOI] [PubMed] [Google Scholar]

[ref11] Hoxhaj G., Manning B. D.. The PI3K–AKT network at the interface of oncogenic signalling and cancer metabolism. Nat. Rev. Cancer. 2020;20(2):74–88. doi: 10.1038/s41568-019-0216-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref12] Robles-Flores M., Moreno-Londoño A. P., Castañeda-Patlán M. C.. Signaling pathways involved in nutrient sensing control in cancer stem cells: an overview. Front. Endocrinol. 2021;12:627745. doi: 10.3389/fendo.2021.627745. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref13] Yuan H.-X., Xiong Y., Guan K.-L.. Nutrient sensing, metabolism, and cell growth control. Mol. Cell. 2013;49(3):379–387. doi: 10.1016/j.molcel.2013.01.019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref14] Manning B. D., Toker A.. AKT/PKB signaling: navigating the network. Cell. 2017;169(3):381–405. doi: 10.1016/j.cell.2017.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref15] Manning B. D., Cantley L. C.. AKT/PKB signaling: navigating downstream. Cell. 2007;129(7):1261–1274. doi: 10.1016/j.cell.2007.06.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref16] You M., Xie Z., Zhang N.. et al. Signaling pathways in cancer metabolism: mechanisms and therapeutic targets. Signal Transduction Targeted Ther. 2023;8(1):196. doi: 10.1038/s41392-023-01442-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref17] Sayem A. S. M., Arya A., Karimian H.. et al. Action of phytochemicals on insulin signaling pathways accelerating glucose transporter (GLUT4) protein translocation. Molecules. 2018;23(2):258. doi: 10.3390/molecules23020258. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref18] Bo T., Gao L., Yao Z.. et al. Hepatic selective insulin resistance at the intersection of insulin signaling and metabolic dysfunction-associated steatotic liver disease. Cell Metab. 2024;36(5):947–968. doi: 10.1016/j.cmet.2024.04.006. [DOI] [PubMed] [Google Scholar]

[ref19] Lieber C. S.. Alcoholic fatty liver: its pathogenesis and mechanism of progression to inflammation and fibrosis. Alcohol. 2004;34(1):9–19. doi: 10.1016/j.alcohol.2004.07.008. [DOI] [PubMed] [Google Scholar]

[ref20] Engin, A. Nonalcoholic Fatty Liver Disease and Staging of Hepatic Fibrosis. In Obesity and Lipotoxicity, Advances in Experimental Medicine and Biology; Springer, 2024; pp 539–574. [DOI] [PubMed] [Google Scholar]

[ref21] Shao C., Xu Y.. PI3K/AKT signaling pathway plays an important role in the pathogenesis of metabolic dysfunction-associated steatotic liver disease. Sci. Rep. 2025;15(1):20593. doi: 10.1038/s41598-025-07612-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref22] Garcia-Echeverria C., Sellers W.. Drug discovery approaches targeting the PI3K/Akt pathway in cancer. Oncogene. 2008;27(41):5511–5526. doi: 10.1038/onc.2008.246. [DOI] [PubMed] [Google Scholar]

[ref23] Lin X., Li X., Lin X.. A review on applications of computational methods in drug screening and design. Molecules. 2020;25(6):1375. doi: 10.3390/molecules25061375. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref24] Rao V. S., Srinivas K.. Modern drug discovery process: An in silico approach. J. Bioinf. Sequence Anal. 2011;2(5):89–94. [Google Scholar]

[ref25] Zdrazil B.. Fifteen years of ChEMBL and its role in cheminformatics and drug discovery. J. Cheminf. 2025;17(1):1–9. doi: 10.1186/s13321-025-00963-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref26] Alcázar J. J., Sánchez I., Merino C.. et al. A Simple Machine Learning-Based Quantitative Structure–Activity Relationship Model for Predicting pIC50 Inhibition Values of FLT3 Tyrosine Kinase. Pharmaceuticals. 2025;18(1):96. doi: 10.3390/ph18010096. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref27] Murrell D. S., Cortes-Ciriano I.. et al. Chemically Aware Model Builder (camb): an R package for property and bioactivity modelling of small molecules. J. Cheminf. 2015;7(1):45. doi: 10.1186/s13321-015-0086-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref28] Mitra S., Chatterjee S., Bose S.. et al. Finding structural requirements of structurally diverse α-glucosidase and α-amylase inhibitors through validated and predictive 2D-QSAR and 3D-QSAR analyses. J. Mol. Graphics Modell. 2024;126:108640. doi: 10.1016/j.jmgm.2023.108640. [DOI] [PubMed] [Google Scholar]

[ref29] Nguyen H. D., Kim M.-S.. Identification of promising inhibitory heterocyclic compounds against acetylcholinesterase using QSAR, ADMET, biological activity, and molecular docking. Comput. Biol. Chem. 2023;104:107872. doi: 10.1016/j.compbiolchem.2023.107872. [DOI] [PubMed] [Google Scholar]

[ref30] Chen J., Wang X., Lei F.. Data-driven multinomial random forest: a new random forest variant with strong consistency. J. Big Data. 2024;11(1):34. doi: 10.1186/s40537-023-00874-6. [DOI] [Google Scholar]

[ref31] Zhou Y., Li S., Zhao Y.. et al. Quantitative structure–activity relationship (QSAR) model for the severity prediction of drug-induced rhabdomyolysis by using random forest. Chem. Res. Toxicol. 2021;34(2):514–521. doi: 10.1021/acs.chemrestox.0c00347. [DOI] [PubMed] [Google Scholar]

[ref32] Wu Y., Huo D., Chen G.. et al. SAR and QSAR research on tyrosinase inhibitors using machine learning methods. SAR QSAR Environ. Res. 2021;32(2):85–110. doi: 10.1080/1062936X.2020.1862297. [DOI] [PubMed] [Google Scholar]

[ref33] Rodríguez-Pérez R., Bajorath J.. Evaluation of multi-target deep neural network models for compound potency prediction under increasingly challenging test conditions. J. Comput.-Aided Mol. Des. 2021;35(3):285–295. doi: 10.1007/s10822-021-00376-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref34] Belete D. M., Huchaiah M. D.. Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results. Int. J. Comput. Appl. 2022;44(9):875–886. doi: 10.1080/1206212X.2021.1974663. [DOI] [Google Scholar]

[ref35] Dhilsath F. M., Samuel S. J.. Hyperparameter tuning of ensemble classifiers using grid search and random search for prediction of heart disease. Comput. Intell. Healthcare Inf. 2021:139–158. doi: 10.1002/9781119818717.ch8. [DOI] [Google Scholar]

[ref36] Roy, K. ; Kar, S. ; Das, R. N. . Das, Statistical Methods in QSAR/QSPR, in A Primer on QSAR/QSPR Modeling: Fundamental Concepts; Springer, 2015; pp 37–59. [Google Scholar]

[ref37] Pettersen E. F., Goddard T. D., Huang C. C.. et al. UCSF Chimeraa visualization system for exploratory research and analysis. J. Comput. Chem. 2004;25(13):1605–1612. doi: 10.1002/jcc.20084. [DOI] [PubMed] [Google Scholar]

[ref38] Rasmussen M. H., Strandgaard M., Seumer J.. et al. SMILES all around: structure to SMILES conversion for transition metal complexes. J. Cheminf. 2025;17(1):1–13. doi: 10.1186/s13321-025-01008-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref39] Guo F., Yang H., Li S.. et al. Using Gaussian accelerated molecular dynamics combined with Markov state models to explore the mechanism of action of new oral inhibitors on Complex I. Comput. Biol Med. 2024;177:108598. doi: 10.1016/j.compbiomed.2024.108598. [DOI] [PubMed] [Google Scholar]

[ref40] He X., Man V. H., Yang W.. et al. A fast and high-quality charge model for the next generation general AMBER force field. J. Chem. Phys. 2020;153(11):114502. doi: 10.1063/5.0019056. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref41] Huggins D. J.. Comparing the performance of different AMBER protein forcefields, partial charge assignments, and water models for absolute binding free energy calculations. J. Chem. Theory Comput. 2022;18(4):2616–2630. doi: 10.1021/acs.jctc.1c01208. [DOI] [PubMed] [Google Scholar]

[ref42] Miao Y., Feher V. A., McCammon J. A.. Gaussian accelerated molecular dynamics: unconstrained enhanced sampling and free energy calculation. J. Chem. Theory Comput. 2015;11(8):3584–3595. doi: 10.1021/acs.jctc.5b00436. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref43] Salomon-Ferrer R., Götz A. W., Poole D.. et al. Routine microsecond molecular dynamics simulations with AMBER on GPUs. 2. Explicit solvent particle mesh Ewald. J. Chem. Theory Comput. 2013;9(9):3878–3888. doi: 10.1021/ct400314y. [DOI] [PubMed] [Google Scholar]

[ref44] Genheden S., Ryde U., Söderhjelm P.. Binding affinities by alchemical perturbation using QM/MM with a large QM system and polarizable MM model. J. Comput. Chem. 2015;36(28):2114–2124. doi: 10.1002/jcc.24048. [DOI] [PubMed] [Google Scholar]

[ref45] Shanak S., Bassalat N., Barghash A.. et al. Drug discovery of plausible lead natural compounds that target the insulin signaling pathway: Bioinformatics approaches. Evidence-Based Complementary Altern. Med. 2022;2022(1):2832889. doi: 10.1155/2022/2832889. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref46] Radziszewski M., Galus R., Łuszczyński K.. et al. The RAGE Pathway in Skin Pathology Development: A Comprehensive Review of Its Role and Therapeutic Potential. Int. J. Mol. Sci. 2024;25(24):13570. doi: 10.3390/ijms252413570. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref47] Sarkar, B. et al. Fundamental Approaches of Drug Discovery, in Biochemical and Molecular Pharmacology in Drug Discovery; Elsevier, 2024; pp 251–282. [Google Scholar]

PERMALINK

Discovery of AKT1 Inhibitors for Obesity and Metabolic Dysfunction-Associated Steatotic Liver Disease Using QSAR-Guided Virtual Screening and Gaussian Accelerated Molecular Dynamics

Kun Cao

Ruonan Wang

Dong Ou

Siyu Wu

Yiyao Chen

Lianhai Li

Xinguang Liu

Abstract

1. Introduction

2. Experimental Section

2.1. Retrieval and Curation of the Data Set

2.2. Molecular Fingerprints and Feature Generation

2.3. SAR Model Development and Validation

2.4. Virtual Screening and Molecular Docking

2.5. Molecular Dynamics Simulations

2.6. Trajectory and Free Energy Analyses

3. Results and Discussion

3.1. Rationale and Bioactivity Landscape of AKT1 Inhibitors

1.

3.2. Molecular Property Distribution across Bioactivity Classes

2.

3.3. Model Performance Evaluation Using Different Molecular Fingerprints

1. Performance Values Are Reported as Mean ± Standard Deviation across Repeated Cross-Validation Runs .

2. External Validation and Y-Scrambling Results of Random Forest QSAR Models Using Different Molecular Fingerprints .

3.4. QSAR Model Performance Using Molecular Fingerprints

3.

3.5. Feature Importance Analysis of Substructure Fingerprints

4.

3.6. Applicability Domain Analysis

5.

3.7. Virtual Screening and Molecular Docking

6.

3. Docking Scores and Interaction Details of Thereference Ligand and Top Virtual Screening Hits.

3.8. Protein–Ligands Dynamic Stability Analysis

7.

3.9. Residue Flexibility and Secondary Structure Analysis

8.

3.10. Compactness of Protein–Ligand Complexes

9.

3.11. Hydrogen Bonding Stability Analysis

10.

3.12. PCA of Conformational Dynamics

11.

3.13. Essential Dynamics and Collective Motions

12.

3.14. Binding Free Energy Calculations

4. Binding Free Energy Components (kcal/mol) from MM-GBSA Calculations.

4. Conclusions

5. Limitations

Supplementary Material

Acknowledgments

⊥.

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases