Abstract
Drug classification and target identification are crucial yet challenging steps in drug discovery. Existing methods often suffer from inefficiencies, overfitting, and limited scalability. Traditional approaches like support vector machines and XGBoost struggle to handle large, complex pharmaceutical datasets effectively. Deep learning models, while powerful, face challenges with interpretability, computational complexity, and generalization to unseen data. This study addresses these limitations by introducing a novel framework: optSAE + HSAPSO. This framework integrates a stacked autoencoder (SAE) for robust feature extraction with a hierarchically self-adaptive particle swarm optimization (HSAPSO) algorithm for adaptive parameter optimization. This combination delivers superior performance across various classification metrics. Experimental evaluations on datasets from DrugBank and Swiss-Prot demonstrate that optSAE + HSAPSO achieves a high accuracy of 95.52%. Notably, it exhibits significantly reduced computational complexity (0.010 s per sample) and exceptional stability (± 0.003). Compared to state-of-the-art methods, the framework offers higher accuracy, faster convergence, and greater resilience to variability. Furthermore, ROC and convergence analyses confirm its robustness and generalization capability, maintaining consistent performance across both validation and unseen datasets. By leveraging advanced optimization techniques, the framework efficiently handles large feature sets and diverse pharmaceutical data, making it a scalable and adaptable solution for real-world drug discovery applications. However, the method’s performance is dependent on the quality of the training data, and fine-tuning may be necessary for high-dimensional datasets. Despite these limitations, the optSAE + HSAPSO framework demonstrates transformative potential, significantly reducing computational overhead while improving classification accuracy and reliability. This work advances the field of pharmaceutical informatics by presenting a reliable and efficient framework for drug classification and target identification. These findings open promising avenues for future research, including extending the framework to other domains such as disease diagnostics or genetic data classification, ultimately accelerating the drug development process.
Keywords: Machine learning, Deep learning, Drug design, Stacked autoencoder, Particle swarm optimization algorithm
Subject terms: Drug discovery, Evolution, Neuroscience, Engineering
Introduction
Drug discovery, the process of identifying and developing compounds that can regulate biological processes within the human body, is a cornerstone of modern medicine. Traditionally, this endeavor has relied heavily on the laborious and time-intensive task of combining atomic or molecular fragments to create novel drug candidates. While methods such as Monte Carlo and genetic algorithms have been employed to streamline this process1, they often suffer from computational inefficiency, low scalability, and suboptimal predictive accuracy, leading to unreliable results in real-world applications. Moreover, significant challenges persist, including prolonged development timelines, substantial financial investments, and a low success rate attributed to high attrition in late-stage trials. The typical drug development pipeline, encompassing target identification and regulatory approval, can extend over a decade (10–17 years), incur costs ranging from $2 to $3 billion, and ultimately yield a success rate of less than 10%2. These formidable obstacles underscore the critical need for innovative computational approaches that enhance predictive reliability while reducing cost and time constraints.
The emergence of Artificial Intelligence (AI) and deep learning (DL) methodologies has ushered in a new era of drug discovery. AI-powered techniques, particularly those rooted in machine learning (ML) and DL, offer a paradigm shift from conventional computational methods. Unlike traditional techniques, AI excels in handling high-dimensional data, uncovering complex molecular patterns, and optimizing biochemical interactions with minimal human intervention. For instance, ML models have been leveraged for predicting drug properties, molecular interactions, and biological activities, automating tasks that previously required extensive experimental validation3. However, many existing AI-based models still suffer from key limitations, including overfitting, poor generalization to unseen molecular structures, and inefficiencies in training high-dimensional datasets4. Deep learning, particularly multi-layered architectures, has addressed some of these shortcomings by leveraging feature extraction and hierarchical representation learning5.
Computer-aided drug design (CADD) traditionally encompasses two primary approaches: structure-based drug design (SBDD) and ligand-based drug design (LBDD)6. While these strategies have significantly advanced pharmaceutical research, they often rely on simplified molecular representations and heuristic scoring functions, which may lead to suboptimal predictions and high false-positive rates. AI integration has mitigated some of these drawbacks by enhancing predictive modeling through non-linear feature extraction, improving binding affinity predictions, and reducing reliance on handcrafted molecular descriptors. For example, support vector machines (SVMs), Bayesian classifiers, random forests, and boosting algorithms have demonstrated improvements in drug design efficiency over conventional computational approaches7. However, these models still require substantial feature engineering and exhibit performance degradation when applied to novel chemical entities8. Recent advancements in deep learning architectures, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and autoencoder networks, have significantly improved drug discovery outcomes by capturing intricate molecular representations9.
Deep learning models utilize multiple layers of non-linear transformations to detect abstract and latent features that may elude conventional computational techniques. A key advantage of deep learning lies in its ability to process large-scale, unstructured, and heterogeneous datasets without requiring extensive manual feature extraction. This significantly improves both prediction accuracy and computational efficiency10. Nevertheless, challenges such as inefficient hyperparameter tuning, limited interpretability, and susceptibility to data bias remain major hurdles in deploying these models for real-world pharmaceutical applications11. Mitigating overfitting and ensuring optimal generalization are critical concerns, requiring robust regularization techniques, adaptive learning rate strategies, and transfer learning methodologies12,13.
Within the realm of data-driven learning approaches, optimization algorithms play a pivotal role in enhancing the performance of AI models. Among these methods, Particle Swarm Optimization (PSO) has emerged as a highly effective tool for fine-tuning model parameters. By emulating the collective behavior observed in biological swarms, PSO dynamically balances exploration and exploitation, improving convergence speed and stability in high-dimensional optimization problems. Unlike traditional gradient-based optimization techniques, PSO does not rely on derivative information, making it particularly advantageous for optimizing non-convex objective functions commonly encountered in drug discovery14.
This research addresses the critical limitations in both traditional and AI-driven drug discovery methods by introducing an innovative deep learning framework. The proposed model operates in two distinct phases: firstly, drug-related data undergoes rigorous preprocessing to ensure optimal input quality; secondly, a Stacked Autoencoder (SAE), fine-tuned using Hierarchically Self-Adaptive PSO (HSAPSO), performs classification. This study represents the first instance of utilizing HSAPSO to optimize SAE hyperparameters in the context of drug discovery, achieving an unprecedented accuracy of 95.5% on a curated pharmaceutical dataset. This framework is designed to surpass existing models in predictive reliability, scalability, and computational efficiency, thereby addressing the fundamental challenges faced in AI-driven drug discovery15.
Beyond enhancing predictive accuracy, this study underscores the broader impact of integrating deep learning with adaptive optimization techniques. The proposed HSAPSO-SAE framework addresses fundamental inefficiencies within the conventional drug development pipeline, streamlining processes to reduce costs, minimize timelines, and optimize computational resources. By providing a robust and scalable model for drug discovery, this research reinforces the transformative potential of AI and bioinformatics in accelerating pharmaceutical innovation and improving therapeutic discovery outcomes16,17. The three primary contributions of this research are summarized as follows:
This study presents an advanced Optimized Stacked Autoencoder (optSAE) integrated with Hierarchically Self-Adaptive PSO (HSAPSO), marking the first application of this optimization technique for hyperparameter tuning in pharmaceutical classification tasks. The framework achieves a classification accuracy of 95.5%, outperforming conventional approaches by improving stability, computational efficiency, and predictive reliability. The model’s adaptability allows it to process large-scale datasets, demonstrating its potential for real-world drug discovery applications.
The proposed method addresses computational inefficiencies in existing drug classification models by leveraging deep learning and evolutionary optimization techniques. While traditional drug discovery methods are time-intensive and resource-demanding, this framework enhances predictive efficiency by reducing computational overhead and optimizing feature extraction. The study does not claim to eliminate drug discovery costs but highlights how AI-driven optimization can reduce model training time, improve parameter selection, and enhance classification precision, ultimately contributing to streamlined decision-making in early-stage drug research.
Unlike static machine learning approaches, HSAPSO dynamically adapts hyperparameters during training, optimizing the trade-off between exploration and exploitation. This adaptability enhances generalization across diverse pharmaceutical datasets, mitigating issues such as overfitting and suboptimal hyperparameter selection. Comparative analysis with state-of-the-art methods validates the robustness of the proposed approach, demonstrating its potential for wider applications beyond drug classification, such as protein interaction modeling and biomolecular structure prediction.
The paper is structured as follows: “Related work” provides a detailed overview of prior research in drug classification and target identification, with a focus on machine learning and optimization-based methodologies. “Proposed model” introduces the proposed framework, describing its design and implementation, particularly emphasizing the processes of feature extraction and parameter optimization. “Experimental results” outlines the experimental setup, including descriptions of the datasets utilized and the evaluation metrics applied. The results, also presented in “Experimental results”, compare the performance of the proposed approach with existing state-of-the-art methods. “Discussion and interpretation” explores the broader implications and impact of the proposed framework, while “Conclusion” concludes the paper, summarizing the main contributions and suggesting directions for future research in pharmaceutical classification and optimization.
Related work
The identification of druggable protein targets and the optimization of deep learning models for biomedical applications have received significant attention in recent years. Prior research in this domain broadly falls into two major categories: studies that focus on the task of predicting druggable proteins using machine learning and deep learning techniques, and those that concentrate on optimizing model performance through advanced hyperparameter search strategies. In this section, we review key contributions from both perspectives to contextualize our proposed approach within the current landscape.
Advances in druggable target prediction
Recent advancements in machine learning have significantly enhanced the prediction of drug-target interactions, druggability classification, and molecular optimization. A wide array of models, including SVMs, DNNs, CNNs, and XGBoost, have been employed to process and learn from complex biological data, facilitating computational drug discovery.
For example, Jamali et al.18 developed DrugMiner using SVMs and neural networks, achieving 89.98% accuracy by leveraging 443 protein features extracted from validated sources. Jiang et al.19 employed SVM and XGBoost to predict resistance to breast cancer drugs, achieving MCC = 0.812 and AUC = 0.958. Similarly, Fralish et al.20 proposed a novel classification framework for molecular potency improvement, which outperformed traditional methods on 230 ChEMBL datasets. You et al.21 designed a deep learning model to predict drug-target interactions using integrated descriptors and protein sequences, which demonstrated superior performance and reduced development costs.
Stepniewska-Dziubinska et al.22 implemented a 3D convolutional neural network for binding site identification, enabling accurate structural predictions. Monteiro et al.23 used CNNs to predict drug-target interactions by combining molecular descriptors and protein sequences, yielding higher accuracy than traditional machine learning approaches. Lin et al.24 incorporated a Bagging-SVM ensemble with a genetic algorithm for feature selection, achieving 93.78% accuracy and enhanced computational efficiency. Chen C et al.25 proposed a stacked ensemble classifier using XGBoost and feature selection to improve protein-protein interaction prediction, while Sikander et al.26 introduced XGB-DrugPred, a model that achieved 94.86% accuracy using optimized DrugBank features.
Zhang et al.27 leveraged graph-based deep learning and transformer-like architectures to analyze protein sequences, achieving 95% accuracy on cancer-specific datasets. Wu et al.28 built ensemble models to predict CYP450 inhibition using DNNs and XGBoost, with average accuracy of 90.4%. Song et al.29 combined CNNs and LightGBM to identify ATP-binding sites, reaching an AUC of 0.925. Wang et al.30 applied capsule networks to classify hERG blockers/non-blockers, achieving 92.2% accuracy. Mustapha et al.31 employed XGBoost to predict molecular bioactivity, outperforming SVM and RF in imbalanced datasets. Kundu et al.32 explored the use of quantum tensor networks for protein classification, achieving 94% accuracy with lower resource demands.
These studies highlight how integrating learning architectures with domain-specific features has advanced druggable target prediction. Despite differences in scope and dataset size, most models face limitations such as dependency on high-quality features, computational intensity, or generalizability constraints. The proposed method builds on this foundation by incorporating a powerful deep representation model (optSAE) enhanced with adaptive optimization, addressing these challenges holistically.
Methodological perspectives on hyperparameter optimization
Hyperparameter optimization is a cornerstone of model performance, particularly in high-dimensional biological domains. While classic methods like Random Search and Grid Search are simple to implement, they often fail to efficiently explore complex, non-convex search spaces. As a result, researchers have explored metaheuristic techniques such as PSO and genetic algorithms to overcome these limitations.
Toussi and Haddadnia33 applied PSO to optimize SVM and neural network models for protein structure prediction, improving accuracy by 5–6%. Lin et al.24 used a genetic algorithm for feature selection in a Bagging-SVM framework, reducing dimensionality and improving run-time performance. Chen et al.25 and Sikander et al.26 also adopted feature optimization strategies, using XGBoost and ensemble classifiers to boost classification performance on biomedical data. These works affirm the value of incorporating optimization layers, but they still rely on static parameter tuning or one-shot optimization.
The present study introduces Hierarchically Self-Adaptive PSO (HSAPSO), a novel variant that dynamically adjusts inertia weight and learning coefficients during optimization. By embedding hierarchical learning mechanisms, HSAPSO enables faster convergence and enhanced generalization across data folds. While previous methods offer substantial gains through static or semi-adaptive optimization, HSAPSO provides a fully adaptive strategy that ensures superior convergence with fewer iterations and greater robustness. Its offline optimization structure mitigates the computational overhead typically associated with metaheuristics, focusing on improving generalization and stability in test-time deployment. Collectively, the proposed optSAE + HSAPSO framework represents a methodological advancement over existing strategies by combining deep feature extraction with adaptive metaheuristic optimization, resulting in high accuracy, efficient inference, and resilient model behavior in real-world drug discovery scenarios.
Proposed model
Figure 1 presents the essential stages of the proposed framework for drug design, combining machine learning methodologies with sophisticated optimization strategies. The proposed framework combines advanced machine learning and optimization techniques to address drug classification challenges. It begins with protein data preprocessing for quality assurance, followed by dual-feature extraction, leveraging contextual embeddings (e.g., ProtBERT) and evolutionary features.
Fig. 1.
Showcases the main components of the drug design framework, demonstrating the step-by-step progression from initial data preparation to the system’s ultimate deployment.
A SAE handles classification, optimized through a Hierarchically Self-Adaptive PSO (HSAPSO) algorithm, which adapts dynamically for robust parameter tuning.
Dataset
To classify potential druggable proteins, protein data was collected from the DrugBank18,34 and Swiss-Prot35 databases, which include comprehensive collections of protein sequences (see Table 1). The dataset under study consists of 2543 protein sequences, comprising 1224 druggable target proteins from DrugBank and 1319 non-target proteins from Swiss-Prot. These databases focus on proteins that act as drug targets in the human body, with classification based on various features, including structure, function, and biological interactions. The raw data consists of amino acid sequences stored as textual strings, varying in length and complexity, without any predefined numerical dimensions. These sequences were then transformed into numerical features suitable for machine learning models through feature extraction methods. Key features from the protein sequences were derived using approaches such as dipeptide composition, which encodes each sequence into a numerical vector representation for subsequent analysis.
Table 1.
Summary of data from drugbank, Swiss-Prot, and the selected dataset. The table highlights total entries, data types, unique targets, sequence statistics, and applications across the three categories.
| Category | DrugBank | Swiss-Prot | Selected data |
|---|---|---|---|
| Total entries | 10,000 drugs | 20,000 proteins | 2543 protein sequences |
| Types |
Small molecules: 7500 Biologics: 2500 |
Human proteins: 20,000 E. coli proteins: 4500 |
Druggable targets: 1224 Non-targets: 1319 |
| Unique targets | 1200 | N/A | 1224 druggable proteins |
| Average targets per drug | 2.3 | N/A | N/A |
| Drug categories |
Anticancer: 800 Antibiotics: 1200 Cardiovascular: 600 |
Proteins with domains: 15,000 Phosphorylated proteins: 5000 |
N/A |
| Sequence statistics | N/A |
Average length: 350 Range: 50–2320 |
Text sequences, varying length |
| Applications | Drug-target interaction, mechanism analysis | Protein function, structure, interaction | Druggable target classification |
To construct a robust and unbiased dataset for predicting potential druggable proteins, we carefully selected 1319 non-target proteins from Swiss-Prot. Given that Swiss-Prot contains over 20,000 human proteins, it was essential to filter this large dataset to retain only high-quality and biologically relevant proteins. One major challenge was redundancy in protein sequences, where many entries shared high sequence similarity, potentially introducing bias into machine learning models. To address this, we applied Cluster Database at High Identity with Tolerance (CD-HIT), a widely used clustering algorithm, to eliminate redundant sequences and retain unique representative proteins. In this study, CD-HIT was used with a fixed identity threshold of 90%, ensuring removal of highly similar sequences while preserving diversity. This preprocessing step helped mitigate data leakage and improved generalization.
This step significantly improves computational efficiency and ensures meaningful learning from distinct protein sequences36. Additionally, many proteins in Swiss-Prot lack complete annotations or validated functional data, making them unsuitable for inclusion in predictive modeling. To enhance the reliability of the dataset, we excluded poorly characterized proteins and retained those with sufficient biological information. Another key factor in the selection process was dataset balancing, as imbalanced datasets can lead to biased classification models. To ensure fair and unbiased learning, we carefully matched the number of druggable proteins (1224) and non-target proteins (1319) to create a balanced dataset. This selection approach follows best practices in bioinformatics and computational drug discovery to construct a high-quality, non-redundant, and biologically informative dataset, facilitating accurate and generalizable predictions37.
Moreover, the classification of druggable versus non-druggable targets was determined using established biochemical and pharmacological criteria, supported by curated datasets and experimental validation. Druggable targets are defined as proteins or biomolecules that exhibit high binding affinity with small molecules and have been experimentally validated in databases like DrugBank and ChEMBL. These targets typically demonstrate Kd, Ki, or IC50 values in the nanomolar to micromolar range, indicating their potential for therapeutic intervention. Conversely, non-druggable targets are proteins that lack significant binding interactions with known bioactive compounds or have shown poor ligandability in high-throughput screening assays. The dataset was constructed by integrating information from experimentally validated drug-target interactions (DTIs), ensuring that each classification is based on real-world pharmacological data. Additionally, computational filters, such as molecular docking scores and structural ligandability assessments, were used to improve the reliability of this classification.
Protein analysis
Protein data preprocessing is a critical step to ensure the quality and relevance of input data before entering the feature extraction phase. The process begins with data collection from reliable protein databases such as UniProt, PDB, or other curated repositories. These sources provide detailed structural and functional information about proteins, including sequences, annotations, and three-dimensional configurations. The raw data often contains inconsistencies, missing values, or redundant information, which must be addressed through cleaning and normalization. Cleaning involves removing duplicate entries, filling missing information using imputation techniques, and correcting errors in sequences. Normalization ensures that data from diverse sources are standardized, enabling compatibility across analytical tools. For example, sequence lengths are truncated or padded to uniform sizes to maintain consistency during processing.
Once the raw data is cleaned and standardized, additional preprocessing techniques such as encoding and transformation are applied to make the data machine-readable. For sequence-based analyses, amino acid sequences are often transformed into numerical representations using encoding schemes like one-hot encoding, position-specific scoring matrices (PSSMs), or embeddings generated by models like ProtBERT. Structural data may undergo geometric transformations or be converted into graph-based representations, where nodes represent atoms or residues, and edges indicate bonds or interactions. These representations capture both spatial and sequential information. Moreover, noise reduction techniques such as low-pass filtering or Principal Component Analysis (PCA) are employed to eliminate irrelevant variations, preserving critical features while reducing dimensionality. By ensuring the data is accurately prepared and adequately transformed, the preprocessing phase sets the stage for effective and meaningful feature extraction in the subsequent steps.
The post-processing steps and fundamental components of the workflow, including the integration of key optimization strategies, are further detailed in Fig. 2, which provides a comprehensive representation of the core algorithms and methodologies employed in the optSAE + HSAPSO framework.
Fig. 2.
The figure depicts the workflow of the proposed method, outlining the sequential stages from data preprocessing to feature extraction, integration, optimization, and classification.
Feature extraction
The proposed feature extraction method combines contextual embeddings from pre-trained models with evolutionary features derived from multiple sequence alignments (MSAs). This dual approach captures both global context and local evolutionary constraints, providing a comprehensive feature set for downstream machine learning tasks. Pre-trained models, such as ProtBERT and ESM, transform protein sequences into high-dimensional embeddings. These embeddings encode the sequential and contextual relationships between amino acids. Given a protein sequence
, where Ai represents the i-th amino acid:
![]() |
1 |
Here,
is the embedding matrix, d is the embedding dimension, and
denotes the pre-trained model. Each row
captures the contextual representation of amino acid Ai. The embeddings are then averaged across the sequence to produce a fixed-size vector:
![]() |
2 |
This sequence-wide representation captures global properties of the protein. Evolutionary features highlight conserved regions critical to protein function. MSAs align the target protein sequence with homologous sequences to calculate conservation metrics. Given an MSA M, where Mij represents the j-th residue of the i-th aligned sequence, the conservation score for position j is calculated as:
![]() |
3 |
where
is the frequency of amino acid k at position j, and mmm is the total number of amino acid types. This entropy-based score
measures sequence variability, with lower values indicating higher conservation. The conservation scores are normalized and concatenated into a feature vector:
![]() |
4 |
To leverage both contextual embeddings and evolutionary insights, the feature vectors are integrated through concatenation:
![]() |
5 |
Here,
denotes vector concatenation. The combined feature vector
captures both global context and local evolutionary constraints. Although attention mechanisms are commonly used to weight heterogeneous features, in this work we adopted a simple concatenation strategy to merge the learned representations of protein and drug features. This design choice ensured computational efficiency and interpretability, while maintaining strong predictive performance:
![]() |
6 |
where α is a learnable parameter that balances the contributions of contextual and evolutionary features.
Moreover, to ensure uniform scaling and reduce redundancy, the integrated feature vector is normalized and subjected to dimensionality reduction using Principal Component Analysis (PCA):
![]() |
7 |
This transformation minimizes noise while preserving the most informative dimensions for predictive tasks. The final reduced feature vector
is used as input for downstream machine learning models. Its compact yet rich representation enhances the efficiency and accuracy of predictions, such as drug-target interactions or protein function annotation. This advanced feature extraction framework combines the strengths of contextual embeddings and evolutionary metrics to produce a robust representation of protein sequences. By integrating global and local features, the approach captures the intricate patterns necessary for identifying druggable proteins and provides a scalable solution for large datasets.
Model training and optimization
The use of a Stacked Autoencoder (SAE) in the second stage for feature extraction and classification is driven by its ability to handle high-dimensional, complex data efficiently. Unlike a plain Autoencoder (AE), which consists of a single encoding and decoding layer, the proposed SAE features multiple encoding and decoding layers, allowing it to learn hierarchical feature representations. SAEs excel at reducing dimensionality while preserving critical information, capturing non-linear relationships, and learning structured representations essential for drug discovery tasks. A key characteristic of SAEs is their dual functionality:
Unsupervised pretraining for feature extraction using a reconstruction-based learning objective.
Supervised fine-tuning for classification, where the extracted features are optimized for predictive accuracy.
Additionally, SAEs demonstrate robustness to noisy or incomplete data, ensuring reliable performance in real-world biological datasets. This two-stage approach enhances feature representation quality and boosts classification accuracy by leveraging both hierarchical feature learning and optimization techniques. Following feature extraction, the encoded features are utilized in a classification model. In this framework, a SAE is employed for feature learning and dimensionality reduction before classification. Notably, the term optSAE refers to a structurally enhanced version of the base SAE architecture. While the original SAE employs a standard three-layer encoding/decoding scheme, optSAE was designed with a deeper architecture and additional latent capacity. This refined structure was established prior to hyperparameter tuning and aims to improve representational learning. Subsequently, its core hyperparameters—including latent dimension size, dropout rate, and learning rate—were optimized using HSAPSO, enabling superior convergence and generalization performance.
The proposed SAE consists of three encoding layers (256, 128, and 64 neurons) and three decoding layers (64, 128, and 256 neurons). Each layer learns progressively abstract feature representations, enabling the model to effectively capture intricate drug-related patterns. This deep architecture significantly differs from a standard autoencoder, which typically consists of only a single encoding and decoding transformation. The encoding layers successively compress the input data into a latent representation, which retains the most informative features while discarding redundant noise. The decoding layers then attempt to reconstruct the original input from this compressed space, ensuring that the learned feature embeddings maintain essential structural and functional characteristics. Thus, the SAE undergoes a two-stage training process:
Unsupervised pretraining: The encoding layers learn to compress high-dimensional feature vectors into a lower-dimensional latent space, while the decoding layers reconstruct the original input, minimizing reconstruction loss.
Supervised fine-tuning: The latent space representations are passed through a fully connected classification layer optimized using cross-entropy loss for binary classification.
During the pretraining phase, the encoding layers progressively compress the input feature
into a latent representation
using the following transformation:
![]() |
8 |
where
represents the encoder’s weight matrix,
is the bias vector, and f is the activation function (e.g., ReLU or sigmoid). The decoding layers aim to reconstruct x from the latent representation z to ensure that the extracted features retain relevant information:
![]() |
9 |
where
represents the decoder’s weight matrix,
is the bias vector, and g is an activation function such as sigmoid. The reconstruction error, minimized during unsupervised pretraining, is defined as:
![]() |
10 |
where m is the number of training samples,
denotes the i-th input sample, and
represents its reconstruction. Once pretraining is complete, the encoder’s output (64-dimensional latent representation) is not directly used for classification. Instead, it is passed through a fully connected dense layer with two neurons, where a softmax function is applied:
![]() |
11 |
where z’ = f(Wf
z + bf) is the transformed latent representation after passing through the final dense layer Wf ∈
, and bf ∈
is the corresponding bias vector. This ensures that the classifier operates on a two-dimensional space, which aligns with the binary classification task (drug-target vs. non-target). The supervised fine-tuning process minimizes the cross-entropy loss:
![]() |
12 |
where
is the one-hot encoded label for class j of sample i. The total loss function integrates both reconstruction loss (for feature learning) and classification loss (for predictive accuracy), balanced by λ as a weighting factor:
![]() |
13 |
Optimized learning
The parameters
are optimized using gradient-based methods, such as stochastic gradient descent (SGD), to minimize
:
![]() |
14 |
where
represents the weights and biases of the encoder, decoder, and classifier in the SAE. These parameters are updated iteratively using SGD, ensuring effective learning and improved model performance.
While gradient-based methods are effective in optimizing network parameters, they do not address the selection of optimal hyperparameters, which significantly influences the performance of the SAE model. The SAE architecture consists of multiple encoding and decoding layers, requiring fine-tuning of several critical hyperparameters. Table 2 summarizes the key hyperparameters considered for optimization.
Table 2.
Key hyperparameters of the SAE prioritized for optimization, along with their descriptions and typical ranges.
| Parameter | Description | Typical range |
|---|---|---|
| Number of layers (L) | Total number of encoding and decoding layers in the SAE | 2–10 |
| Number of neurons (n) | Number of neurons per layer, defining model capacity | 16–512 |
| Learning rate (lr) | Step size in gradient descent optimization, affecting convergence | 0.0001–0.01 |
| Latent dimension (k) | Size of the bottleneck layer, controlling feature compression | 2–128 |
| Dropout rate (d) | Fraction of neurons randomly dropped during training to prevent overfitting | 0.1–0.5 |
Simultaneously optimizing all SAE parameters is computationally prohibitive and often results in inefficient training. A more effective approach is to concentrate on the key hyperparameters that significantly impact model convergence and generalization. This focused optimization reduces the search space’s dimensionality, accelerating convergence. Furthermore, it enhances the model’s generalization by mitigating overfitting caused by suboptimal configurations. Finally, it minimizes computational overhead, as hyperparameter tuning is limited to the training phase and does not affect inference-time efficiency. To systematically explore this high-dimensional hyperparameter space, we propose Hierarchically Self-Adaptive PSO (HSAPSO), an enhanced extension of standard PSO that integrates multi-layered adaptation mechanisms, ensuring both efficient exploration and stable convergence.
HSAPSO is specifically engineered for hyperparameter optimization, a distinct process from model weight optimization performed by gradient-based methods like SGD. Unlike SGD, which iteratively updates network weights during training, HSAPSO operates before training begins. It dynamically selects optimal hyperparameter values—such as the number of neurons per layer, learning rate, dropout rate, and latent dimension—to maximize model performance. By automating this selection, HSAPSO eliminates the need for manual tuning, ensuring the model is optimally configured from the start. This leads to improved convergence efficiency and enhanced overall generalization.
HSAPSO introduces three key innovations over standard PSO, making it a highly effective approach for hyperparameter optimization. Hierarchical adaptation dynamically balances exploration and exploitation across different learning stages, ensuring an optimal search trajectory. Fitness-based dynamic subgrouping clusters particles based on similarity in fitness scores, facilitating localized refinement and improving search efficiency. Self-adaptive parameter tuning continuously adjusts hyperparameter search behavior in response to real-time feedback, allowing the algorithm to adapt to varying optimization landscapes. These advancements collectively enable HSAPSO to efficiently optimize SAE hyperparameters, significantly enhancing model performance and convergence stability. Therefore, HSAPSO operates at three hierarchical levels, integrating local adaptation, subgroup-level learning, and global search refinement:
Local particle updates: Each particle updates its hyperparameter values using a modified velocity equation:
![]() |
15 |
Here,
is the inertia weight that controls the influence of a particle’s previous velocity, while c1,c2,c3 are the cognitive, social, and subgroup learning factors, respectively. The random factors
introduce stochasticity, and
represent the personal, global, and subgroup best positions. Including
provides an additional layer of guidance, enhancing convergence. HSAPSO introduces dynamic subgrouping, where particles form clusters based on similarities in fitness or spatial proximity. Subgroup formation enables localized refinement and independent evolution within clusters. Subgroup leaders
periodically communicate with the global swarm to exchange information.
-
(2)
Fitness-based subgroup formation: HSAPSO dynamically clusters particles based on fitness similarity:
![]() |
16 |
where
and
represent the fitness values of particles i and j, respectively. This process clusters particles with comparable fitness values. Subsequently, clustering algorithms, such as k-means or density-based methods, are employed to create dynamic subgroups. These subgroups then evolve independently, with their respective leaders updated based on their local best fitness. This dynamic subgrouping mechanism preserves diversity within the swarm, effectively preventing premature convergence, and guarantees that distinct regions of the search space are thoroughly explored.
-
(3)
Adaptive inertia weight control: The algorithm dynamically adjusts the inertia weight ω to control exploration and exploitation:
![]() |
17 |
Here,
and
are the maximum and minimum inertia weights, while MaxIter and Iter represent the total and current iterations. This mechanism shifts from exploration to exploitation over time, enhancing convergence and stability. The algorithm employs hierarchical fitness memory, storing solutions at three levels: global (
), subgroup (
), and temporal (
). The hierarchical best position
is defined as:
![]() |
18 |
where α,β,γ are dynamically adjusted weights based on optimization performance, and
represents the best position over specific time windows. This multi-tier memory prevents information loss and ensures effective guidance throughout optimization stages.
To validate the effectiveness of HSAPSO, we conducted extensive experiments comparing its performance against several established hyperparameter optimization methods. These included Grid Search, a brute-force approach that exhaustively evaluates all possible hyperparameter combinations; Bayesian Optimization, which employs probabilistic modeling to guide the search process efficiently; and Standard PSO, which lacks the hierarchical adaptation mechanisms of HSAPSO. Additionally, we compared SGD with manually chosen hyperparameters to SGD with HSAPSO-optimized hyperparameters, evaluating their impact on model convergence and performance. The results demonstrated that HSAPSO significantly accelerates convergence and improves final model accuracy, consistently outperforming other optimization techniques in both efficiency and generalization.
HSAPSO is particularly well-suited for optimizing SAE hyperparameters due to its ability to dynamically balance exploration and exploitation, ensuring efficient navigation of the hyperparameter space. Its hierarchical learning mechanism enables it to handle high-dimensional search spaces effectively, allowing it to identify optimal configurations with greater precision. Furthermore, its adaptive subgrouping strategy prevents overfitting and premature convergence by maintaining population diversity throughout the optimization process. These attributes make HSAPSO a highly robust and scalable hyperparameter optimization method, particularly advantageous for complex applications such as drug discovery, where fine-tuning deep learning architectures is crucial for achieving reliable predictive performance.
The scalability, adaptability, and precision of HSAPSO further reinforce its effectiveness in optimizing SAE parameters. Its dynamic subgrouping and hierarchical learning mechanisms enable efficient management of high-dimensional search spaces, while its ability to incorporate environmental feedback ensures adaptation to the SAE’s loss landscape. Additionally, the use of hierarchical fitness memory and subgroup refinement enhances convergence toward optimal configurations. By systematically addressing challenges such as overfitting, premature convergence, and computational complexity, HSAPSO offers a robust and computationally efficient solution for hyperparameter optimization, making it particularly well-suited for high-stakes applications such as drug design.
Implementation details
The implementation of the proposed framework begins with data preprocessing, where protein sequences from DrugBank and Swiss-Prot were standardized for consistency. The dataset includes 2543 protein sequences, comprising 1224 druggable targets and 1319 non-targets. The sequences, initially stored as plain amino acid strings, were processed and transformed into numerical feature vectors using dipeptide composition, yielding 400-dimensional vectors that capture the frequency of amino acid pairs. These features were normalized using min-max scaling to enhance model training efficiency. The dataset was initially partitioned into 90% training and 10% testing subsets, maintaining a balanced distribution of druggable and non-druggable proteins. To enhance model robustness and avoid overfitting, we employed k-fold cross-validation exclusively within the training set. After empirical evaluation of multiple values for k (ranging from 5 to 10), we selected 6-fold cross-validation as the optimal setting, offering a strong balance between computational efficiency, performance stability, and generalization. In this setup, the training partition was divided into six equally sized folds; in each iteration, five folds (75% of the total data) were used for training and one fold (15%) served as a temporary validation set. This rotating validation scheme ensured that every sample participated in both training and validation, providing a more reliable estimate of model performance across unseen data and improving the consistency of hyperparameter tuning outcomes. Following cross-validation, the final model was retrained on the full 90% training set using the best hyperparameter configuration and then evaluated on the fixed 10% test set to assess generalization performance. In addition, we used Extreme Gradient Boosting (XGBoost, version 1.7.6; https://xgboost.ai) and Stacked Autoencoder (SAE) implemented in MATLAB (version R2023b; https://www.mathworks.com/products/matlab.html) to simulate both our proposed model and baseline approaches for comparative evaluation.
The classification was conducted using a SAE, optimized through the HSAPSO. The SAE architecture consisted of three encoding layers with 256, 128, and 64 neurons, a bottleneck layer of 32 neurons, and symmetrical decoding layers. ReLU activation was applied in hidden layers, with dropout (rate: 0.3) to prevent overfitting, and a softmax output layer for classification. The HSAPSO dynamically optimized hyperparameters, including the number of neurons in encoding layers (128, 512), learning rate (0.0001, 0.01), dropout rate (0.1, 0.5), and latent dimension (16, 64). Final optimized parameters included encoding layers of 256, 128, and 64 neurons, a learning rate of 0.001, and a dropout rate of 0.3. HSAPSO was configured with a population of 30 particles over 50 iterations, utilizing cognitive, social, and subgroup factors (2.0, 2.0, 1.5), with a dynamically decaying inertia weight (0.9 to 0.4). Training was conducted on a high-performance system with NVIDIA RTX 3080 GPU and Python-based tools, with early stopping to ensure optimal performance. This robust pipeline achieved highly accurate and efficient classification of druggable proteins.
Notably, all models were trained using early stopping based on validation loss, with a patience threshold of 10 epochs. While a maximum of 100 training epochs was allowed, convergence often occurred well before this limit. This approach ensured efficient training and helped prevent overfitting, particularly for deeper models such as optSAE.
Moreover, we employed SGD as our primary optimization method, prioritizing its stability and well-established convergence properties in deep network training. While adaptive optimizers such as Adam and AdamW offer faster initial convergence, they typically incur a greater computational cost per iteration, especially in large-scale deep learning models. Given the computational demands of optimizing a stacked autoencoder, SGD was selected to promote efficient memory utilization and avoid over-reliance on adaptive moment estimation, which can sometimes hinder generalization in deep architectures. Nevertheless, we performed supplementary experiments comparing SGD with AdamW, and the findings are detailed in subsequent sections.
Evaluation metrics
The two-class mode analysis criteria are employed to determine the correlation between predictions and error estimations. Accuracy, sensitivity, and specificity serve as the primary metrics for evaluating the method’s performance. These metrics are calculated based on the following variables:
True Positive (TP): The number of samples identified as druggable by the proposed algorithm that indeed have the potential to be converted into a drug, based on the sample data.
True Negative (TN): The number of samples correctly identified by the algorithm as non-druggable, aligning with the sample’s inability to be converted into a drug.
False Positive (FP): The number of samples incorrectly identified as non-druggable, despite having the potential for drug conversion.
False Negative (FN): The number of samples incorrectly identified as druggable by the algorithm, when in reality they lack the potential for drug conversion.
In evaluating the possibility of converting a sample into a drug, additional performance criteria include the detection ratio, false alarm ratio, and the balance between these two metrics. Other considerations, such as efficiency estimation, are equally important and include factors like execution speed, responsiveness, and error tolerance. These factors assess the algorithm’s ability to correctly classify samples for potential drug conversion.
The detection ratio is calculated as the proportion of correctly identified druggable samples, validated by expert opinion across various laboratory conditions. Such metrics are considered essential for assessing the performance of the proposed model and are consistent with those used in similar studies. These criteria provide a robust foundation for evaluating the efficiency and reliability of the proposed approach.
The performance of the proposed framework was evaluated using standard classification metrics, including accuracy, sensitivity, specificity, and F1-score, to ensure a comprehensive assessment of its predictive capabilities. These metrics were calculated based on true positive, true negative, false positive, and false negative rates, providing a detailed analysis of the model’s ability to classify druggable and non-druggable proteins effectively. Additionally, to assess the computational efficiency, the time complexity of the algorithm was analyzed. The computational complexity of the proposed framework is primarily focused on the training phase, encompassing feature extraction, SAE training, and HSAPSO optimization. For Ntrain training samples, the complexity of feature extraction, using dipeptide composition, is:
![]() |
19 |
where Ls is the average sequence length. The SAE training phase, with L layers, E epochs, and
input and output neurons, has a complexity of:
![]() |
20 |
where na is the cost of the activation function. The HSAPSO optimization process, involving P particles over I iterations in a d-dimensional search space, is:
![]() |
21 |
Together, the total training complexity ensures robust optimization and efficient learning, while the lightweight testing phase, applied to Ntest samples, involves minimal computation.
To quantitatively evaluate the variability in accuracy, the coefficient of variation (CV), a normalized measure of dispersion, is calculated alongside the standard deviation (σ). The equations are:
![]() |
22 |
where
is the accuracy for each trial,
is the mean accuracy, and n is the number of trials. Additionally, to assess the dispersion further, the range of accuracies (R) and mean absolute deviation (MAD) are computed as follows:
![]() |
23 |
where
and
are the maximum and minimum accuracies, respectively. pharmaceutical applications. To complement these metrics, the variance ratio test is used to compare the variability of this model with competing methods. The formula is:
![]() |
24 |
where σproposed and σbaseline represent the standard deviations of the proposed model and a baseline model, respectively. Moreover, to ensure an objective and well-calibrated evaluation of the model, threshold values for accuracy, sensitivity, and specificity were determined through a combination of empirical analysis and statistical optimization. Initially, a default decision threshold of 0.5 was applied, aligning with standard binary classification practices, where predictions with a probability ≥ 0.5 were labeled as druggable proteins and those < 0.5 as non-druggable. To refine this threshold, Receiver Operating Characteristic (ROC) curve analysis was performed, and the optimal cutoff was identified using Youden’s Index (J = Sensitivity + Specificity − 1), ensuring a balanced trade-off between sensitivity and specificity. Further robustness validation was conducted using five-fold cross-validation, where the selected threshold was assessed across multiple dataset partitions to confirm its consistency and stability. This systematic approach not only optimized classification performance but also mitigated potential biases arising from dataset imbalance, reinforcing the reliability and generalizability of the proposed model.
Experimental results
The results section of this study highlights the effectiveness of the proposed framework in optimizing druggable target identification and classification using the integrated SAE and HSAPSO. The performance metrics are derived from rigorous evaluations on a carefully curated drug design dataset. The framework achieved a remarkable classification accuracy of 95.52%, demonstrating its ability to handle high-dimensional data efficiently. Additionally, comparisons with traditional and state-of-the-art methods underscore the superiority of the proposed approach in terms of computational efficiency, robustness to noise, and scalability to large datasets. These findings validate the potential of this framework to streamline the drug discovery process and address challenges such as overfitting and optimization complexity.
The performance of the proposed SAE + HSAPSO classification framework was evaluated using a series of interference matrices based on two-by-two analyses. Three models were used for comparison:
Model 1: Classification using a basic autoencoder without parameter optimization.
Model 2: Classification using a SAE without parameter optimization.
Model 3: Classification using a SAE optimized via the HSAPSO algorithm.
Each model was tested under three different encoder layer configurations: low-layer count (Scenario 1), medium-layer count (Scenario 2), and high-layer count (Scenario 3). Key performance metrics—accuracy, sensitivity, and specificity—were measured for both training and testing phases, and results are presented in Tables 3, 4 and 5. Processing time for each scenario was also considered to assess computational efficiency and trade-offs.
Table 3.
Performance of the basic autoencoder without parameter optimization (Model 1).
| Scenario | Accuracy (%) | Sensitivity (%) | Specificity (%) | Average time (ms) | Accuracy (%) | Sensitivity (%) | Specificity (%) | Average time (ms) |
|---|---|---|---|---|---|---|---|---|
| Training | Testing | |||||||
| 1 | 89.20 | 49.89 | 87.88 | 430 | 52.88 | 10.88 | 60.88 | 220 |
| 2 | 89.55 | 41.89 | 98.89 | 470 | 54.88 | 11.89 | 71.88 | 233 |
| 3 | 90.05 | 14.90 | 68.89 | 520 | 90.89 | 03.90 | 67.90 | 256 |
Table 4.
Performance of the SAE without parameter optimization (Model 2).
| Scenario | Accuracy (%) | Sensitivity (%) | Specificity (%) | Average time (ms) | Accuracy (%) | Sensitivity (%) | Specificity (%) | Average time (ms) |
|---|---|---|---|---|---|---|---|---|
| Training | Testing | |||||||
| 1 | 90.86 | 91.09 | 90.73 | 610 | 90.76 | 90.79 | 90.40 | 302 |
| 2 | 91.12 | 91.36 | 91.98 | 730 | 90.43 | 90.16 | 90.85 | 370 |
| 3 | 91.26 | 91.03 | 91.41 | 880 | 91.16 | 91.01 | 91.36 | 392 |
Table 5.
Performance of the SAE with parameter optimization (Model 3).
| Scenario | Accuracy (%) | Sensitivity (%) | Specificity (%) | Average time (ms) | Accuracy (%) | Sensitivity (%) | Specificity (%) | Average time (ms) |
|---|---|---|---|---|---|---|---|---|
| Training | Testing | |||||||
| 1 | 95.19 | 95.27 | 94.87 | 1070 | 95.01 | 95.15 | 94.84 | 513 |
| 2 | 95.87 | 96.26 | 95.66 | 1090 | 95.74 | 95.98 | 95.63 | 542 |
| 3 | 95.36 | 95.41 | 95.97 | 1160 | 95.27 | 95.33 | 95.17 | 580 |
The evaluations, conducted using ten-fold cross-validation for robust and unbiased performance assessment, revealed distinct outcomes across the three models. Model 1 provided a solid baseline with moderate accuracy and faster processing times. Model 2 improved accuracy, sensitivity, and specificity but introduced higher computational overhead. Model 3, utilizing the SAE + HSAPSO framework, demonstrated superior performance, achieving an average accuracy of 95%, along with enhanced sensitivity and specificity across all scenarios. Parameter optimization via HSAPSO contributed to a 3–4% improvement in classification performance compared to Models 1 and 2, solidifying Model 3 as the most effective approach. Tables 3, 4 and 5 summarize these findings, highlighting the trade-offs between accuracy and computational efficiency among the models. Notably, the ‘Average Time’ values in Tables 3, 4 and 5 reflect the mean computational time per epoch for training and the mean inference time per sample during testing. Each scenario was executed multiple times (N iterations) to ensure consistency, with training time including forward and backward propagation, while testing time accounts solely for forward inference.
The proposed method balances classification accuracy and computational complexity effectively. Even when the dataset was intentionally reduced by 30–50%, the method exhibited resilience to overfitting, with minimal drops in accuracy. While the processing time increased with more complex scenarios, the trade-off was justified by the accuracy gains, especially in pharmaceutical datasets where precision is paramount. Furthermore, it is noteworthy that the parameters for the SAE were estimated offline, meaning that the runtime performance excludes the optimization phase, ensuring competitive execution speed during classification.
The “Average Time” values reported in Tables 3, 4 and 5 reflect the training and inference times of each classification model after the hyperparameter optimization process was completed. These values indicate the per-epoch training time during model learning, and the average inference time per test sample. Importantly, they do not include the time consumed by the HSAPSO algorithm for searching optimal hyperparameters. Since HSAPSO is performed offline and only once prior to model training, its execution time is reported separately in the Results section to ensure a clear and unbiased evaluation of the optimization process. This clarification aims to avoid any confusion regarding the runtime performance of the optimized models versus the time cost of hyperparameter tuning itself.
The results in Tables 3, 4 and 5 highlight the performance and efficiency distinctions among the three models. Model 1, utilizing a basic autoencoder without parameter optimization, achieves moderate training accuracy (89.20–90.05%) but struggles in testing, with sensitivity as low as 10.88% in Scenario 1, despite the shortest testing time (220–256 ms). Model 2, with a stacked autoencoder, shows significant improvements, with testing accuracies exceeding 90% and a well-balanced sensitivity and specificity, although testing time increases to 302–392 ms due to added complexity. Model 3, the proposed SAE + HSAPSO framework, achieves the best performance, with accuracies surpassing 95% across all scenarios and consistently high sensitivity and specificity, exceeding 94%. While its testing time (513–580 ms) is higher, it remains under one second, making the method suitable for real-time applications. These results demonstrate the SAE + HSAPSO framework’s ability to balance exceptional classification performance with practical computational efficiency, making it the most effective solution for pharmaceutical data classification tasks.
Robustness performance analysis
The robustness and minimal dispersion of the proposed classification model are evident from the performance metrics across Fig. 3. Using the reported range of accuracies (94.5–96.6%) and mean accuracy
the standard deviation is estimated at 0.7%, and the coefficient of variation is approximately 0.73%. This extremely low CV indicates that the model’s performance is highly consistent and exhibits minimal variation across scenarios.
Fig. 3.
Presented as confusion matrices, these are example classification results from our algorithm. These four independent runs were randomly selected from multiple analyses to provide a representative sample. The minimal accuracy variation across runs highlights the model’s consistency and robustness.
From the given range (94.5–96.6%), R = 2.1%, and the MAD is approximately 0.55%, further confirming that the model’s predictions are tightly clustered around the mean. With
and a typical
, the variance ratio (F) is approximately 0.49, indicating that the proposed model has significantly lower dispersion compared to other methods. Furthermore, the performance consistency is evident in the confusion matrices, where sensitivity and specificity consistently exceed 94%. The equations for these metrics, combined with the statistical measures above, confirm the model’s resistance to overfitting and its reliable generalization capabilities. These observations, supported by minimal standard deviation, low CV, and high F-ratios, establish the proposed method as a statistically robust and effective classification framework.
Validated generalization capability
The “Validated Generalization Capability” of a machine learning model is a crucial factor, particularly in pharmaceutical applications where robustness and adaptability to diverse datasets are essential. The SAE + HSAPSO framework leverages ROC curve and Area Under the Curve (AUC) metrics to validate its classification performance across both validation and unseen datasets. These metrics not only assess the model’s ability to distinguish between classes accurately but also ensure that it maintains high classification strength without overfitting. Figure 4 illustrates ROC curves with steep inclines toward the top-left corner, reflecting high sensitivity (True Positive Rate) and low False Positive Rate (FPR). The AUC values, ranging from 0.949 to 0.981 across validation and unseen datasets, confirm the stability and robustness of the SAE + HSAPSO framework.
Fig. 4.
ROC curves for the optSAE + HSAPSO framework across validation and test datasets.
The AUC for test data remains consistently stable, demonstrating that the model generalizes effectively across unseen samples. In contrast, the validation AUC exhibits slightly more variation, which can be attributed to the fact that the validation phase involves repeated evaluations on multiple random partitions of the dataset. Despite this minor variation, the model’s ability to generalize remains intact, as evidenced by its consistently high AUC scores. The robust performance of SAE + HSAPSO, particularly on test data, underscores its resilience and confirms its capability to handle real-world pharmaceutical classification challenges with precision.
The SAE + HSAPSO framework exhibits exceptional generalization capability, as evidenced by the narrow gap in AUC values between validation and unseen datasets. Even in the most challenging scenarios, the AUC for unseen data remains above 0.949, reinforcing the model’s adaptability. As presented in the Fig. 4, the AUC values on both the validation and test datasets remain consistently high (ranging from 0.961 to 0.967), indicating that the model performs robustly and generalizes well. The close alignment between the two curves also reflects balanced behavior with no evident bias toward either class. Achieving balanced classification ensures that both druggable and non-druggable targets are identified with equal accuracy, preventing biases that could impact drug discovery decisions.
The consistency of the SAE + HSAPSO model across validation and unseen datasets is further reinforced by the small variation in AUC values (0.949–0.981), which highlights its stability and resilience to data variability. Statistical evaluations, including the coefficient of variation and mean absolute deviation, further corroborate the model’s robustness under diverse conditions. The high AUC scores serve as aggregate indicators of sensitivity and specificity, validating the framework’s reliability for pharmaceutical classification tasks.
The ROC curves and corresponding AUC metrics presented in Fig. 4 provide compelling evidence of the SAE + HSAPSO framework’s validated generalization capability. Its robust, balanced, and consistent performance across validation and unseen datasets, along with strong supporting statistical metrics, firmly establishes it as a reliable and efficient tool for pharmaceutical classification. By successfully addressing challenges such as overfitting and data variability while maintaining high precision and adaptability, the SAE + HSAPSO approach presents a powerful and scalable solution for drug discovery applications.
Optimization convergence analysis
The optimization convergence analysis of the proposed optimized SAE (We named it optSAE) model, bolstered by the HSAPSO algorithm, reveals its remarkable capability to outperform traditional autoencoder (AE) and stacked autoencoder (SAE) structures. As demonstrated in Fig. 5, the optSAE model consistently achieves faster convergence to the optimal value with superior accuracy and stability across training and validation datasets. This distinctive behavior underscores the strength of the HSAPSO in dynamically fine-tuning parameters, enabling optSAE to address critical challenges in drug discovery, including non-generalization and interpretability issues. Unlike conventional structures, optSAE’s deeper layers and expanded channels facilitate effective feature learning and robust classification, positioning it as a cutting-edge solution for pharmaceutical applications.
Fig. 5.
Comparison of the optSAE model with other transfer learning structures, including AE and SAE. The figure highlights the superiority of optSAE in terms of: (a) convergence of accuracy towards the optimal value for training data, (b) loss calculation for training data, (c) convergence of accuracy for validation data, and (d) loss calculation for validation data.
A key highlight of the optSAE model is its efficiency in achieving zero error (optimal convergence) faster than comparable methods. Figure 5a,c display the accuracy curves, while Fig. 5b,d show the corresponding loss curves, illustrating the convergence behavior of the model during training.
While models such as SAE and AE demonstrate slower and less consistent convergence, the optSAE model mitigates these issues by leveraging a carefully configured architecture that prioritizes stability and adaptability. Despite the optSAE model’s slightly higher computational complexity, its faster convergence and superior accuracy justify the trade-off, ensuring scalability in real-world pharmaceutical datasets.
Its ability to generalize effectively is evident from the consistent accuracy metrics across validation data, even when applied to previously unseen datasets. This generalizability is attributed to the integration of HSAPSO, which optimizes deeper layers and richer feature maps without introducing overfitting. Unlike traditional AE and SAE architectures that struggle with the diversity and complexity of pharmaceutical data, the optSAE framework excels by capturing subtle patterns and complex relationships in the dataset. The stability observed in the convergence plots demonstrates the model’s resistance to parameter oscillations, ensuring reliable performance across various classification scenarios.
The architectural design of optSAE sets it apart from other learning transfer models. By employing a higher number of layers, the model overcomes challenges related to interpretability and computational efficiency while maintaining its ability to learn complex representations. The deeper layers and increased channels enable optSAE to extract highly discriminative features, crucial for drug classification tasks. Moreover, the convergence plots underscore the model’s adaptability when applied to pharmaceutical data with limited fine-tuning. As the volume of data increases, the advantages of optSAE over traditional methods like AE and SAE are expected to amplify, making it a scalable and future-proof solution for drug design. This blend of convergence efficiency, robustness, and superior feature extraction cements optSAE as a transformative innovation in pharmaceutical machine learning.
The observed acceleration in convergence for optSAE can be attributed to two key factors. First, the deeper and better-balanced layer design enhances gradient flow and facilitates the learning of compact yet expressive latent features, reducing reconstruction and classification error more efficiently. Second, the use of HSAPSO ensures optimal selection of learning rates, dropout, and neuron counts through hierarchical adaptation, avoiding local optima and accelerating convergence. Together, these aspects contribute to optSAE’s rapid and stable convergence compared to baseline SAE or other optimized variants.
Ablation study
Table 6 presents key performance metrics, highlighting our models’ practical use and reliability. Computational complexity (seconds/sample) reflects average inference time per sample, derived from multiple runs, indicating real-time applicability. Training time (seconds/iteration) shows average optimization step duration, measured consistently across epochs. Convergence iterations indicate training cycles to stability, determined by loss function monitoring, with fewer iterations signifying efficiency. Stability (standard deviation) assesses robustness, with lower values denoting consistent performance and reduced deployment risk. These metrics, as detailed in Table 6, provide a comprehensive evaluation of our models. All metrics reported in this table are based on the performance of the final trained models evaluated on the test set. The computational complexity values reflect average inference time per sample. Notably, the reported results exclude any time spent during hyperparameter tuning, which is handled separately and conducted offline.
Table 6.
Comparative performance metrics of various feature extraction and optimization combinations, including SAE and optsae paired with optimization algorithms (SGA, SPSO, HSAPSO), as well as their standalone implementations. Note: optsae is a structurally enhanced SAE variant with deeper encoding-decoding layers, not merely a hyperparameter-tuned version. This distinction accounts for its superior performance.
| Method | Accuracy (%) | F1-score (%) | Recall (%) | Specificity (%) | Computational complexity (s/sample) | Convergence iterations | Stability (Std. Dev.) | AUC (%) |
|---|---|---|---|---|---|---|---|---|
| SAE | 87.6 | 85.4 | 83.5 | 89.2 | 0.012 | 40 | ± 0.010 | 90.3 |
| optSAE | 91.4 | 89.8 | 88.3 | 92.5 | 0.011 | 35 | ± 0.007 | 92.8 |
| SAE + SGA | 88.2 | 86.7 | 85.2 | 90.1 | 0.016 | 50 | ± 0.009 | 91.0 |
| optSAE + SGA | 92.3 | 90.9 | 89.6 | 93.7 | 0.014 | 45 | ± 0.006 | 93.5 |
| SAE + SPSO | 89.5 | 88.0 | 86.5 | 91.8 | 0.015 | 30 | ± 0.008 | 92.2 |
| optSAE + SPSO | 94.1 | 92.7 | 91.3 | 94.9 | 0.013 | 25 | ± 0.005 | 95.0 |
| SAE + HSAPSO | 90.8 | 89.4 | 88.1 | 92.2 | 0.014 | 20 | ± 0.006 | 94.0 |
| Proposed model | 95.8 | 94.6 | 93.5 | 96.8 | 0.010 | 15 | ± 0.003 | 96.5 |
The standalone SAE and optSAE models demonstrate their capabilities as baseline feature extraction methods, with the optSAE outperforming SAE in all metrics. SAE achieves an accuracy of 87.6%, with an F1-Score of 85.4%, recall of 83.5%, and specificity of 89.2%. Its computational complexity of 0.012 s/sample reflects moderate efficiency, but its higher standard deviation (± 0.010) indicates limited stability. Conversely, optSAE, with deeper layers and improved feature representations, achieves an accuracy of 91.4%, F1-Score of 89.8%, and a lower standard deviation (± 0.007), indicating enhanced reliability. However, both models lack optimization, which limits their ability to reach peak performance, particularly in highly complex datasets. Standard methods, defined as optimization techniques with predefined parameter configurations sourced from benchmark studies, serve as essential baselines for evaluating the efficacy of innovative approaches.
When SAE and optSAE are combined with optimization methods like SGA, SPSO, and HSAPSO, significant improvements are observed across all metrics. For instance, SAE + SGA improves accuracy to 88.2%, F1-Score to 86.7%, and shows a computational complexity of 0.016 s/sample, though its slower convergence (50 iterations) highlights inefficiencies in parameter tuning. optSAE + SGA achieves better accuracy (92.3%) and F1-Score (90.9%), reflecting the benefit of coupling optSAE with SGA, but convergence remains slower compared to SPSO and HSAPSO. The combination of SAE and SPSO further enhances accuracy to 89.5%, while optSAE + SPSO significantly improves performance, achieving 94.1% accuracy and a lower computational complexity of 0.013 s/sample, demonstrating the effectiveness of SPSO in balancing exploration and exploitation during optimization.
When SAE and optSAE are combined with optimization methods like SGA, SPSO, and HSAPSO, significant improvements are observed across all metrics. For instance, SAE + SGA improves accuracy to 88.2%, F1-Score to 86.7%, and shows a computational complexity of 0.016 s/sample, though its slower convergence (50 iterations) highlights inefficiencies in parameter tuning. optSAE + SGA achieves better accuracy (92.3%) and F1-Score (90.9%), reflecting the benefit of coupling optSAE with SGA, but convergence remains slower compared to SPSO and HSAPSO. The combination of SAE and SPSO further enhances accuracy to 89.5%, while optSAE + SPSO significantly improves performance, achieving 94.1% accuracy and a lower computational complexity of 0.013 s/sample. These results demonstrate the effectiveness of SPSO in balancing exploration and exploitation during the optimization process. Achieving an accuracy of 95.8%, F1-Score of 94.6%, recall of 93.5%, and specificity of 96.8%, it sets a new benchmark for robust and reliable classification. Notably, its computational complexity is reduced to 0.010 s/sample, and it converges in just 15 iterations, highlighting the efficiency of HSAPSO’s adaptive parameter tuning. Moreover, the stability (standard deviation of ± 0.003) and AUC of 96.5% underscore the method’s resilience and generalizability.
Hyperparameter optimization and convergence
To clarify the computational cost of hyperparameter optimization, we provide detailed timing and configuration information in Table 7. Specifically, the total runtime required for each optimizer (HSAPSO, SPSO, and SGA) to converge on optimal hyperparameter settings is reported separately from model training and inference times. For example, the optSAE + HSAPSO configuration required approximately 870 s of optimization time using 30 particles over 50 iterations.
Table 7.
Offline optimization time and training runtime per iteration, along with hyperparameter search spaces and final selected values for each optimizer configuration.
| Configuration | Optimizer | Optimization time (s) | Training time per iteration (s) | Search space | Final settings |
|---|---|---|---|---|---|
| optSAE + SGA | SGA | 770 | 0.02 | LR: [0.0001-0.01], Dropout: [0.1–0.5], Neurons: [64–512], Latent: [16–128] | LR: 0.001, Dropout: 0.3, Neurons: 256, Latent: 64 |
| optSAE + SPSO | SPSO | 835 | 0.02 | LR: [0.0001-0.01], Dropout: [0.1–0.5], Neurons: [64–512], Latent: [16–128] | LR: 0.001, Dropout: 0.3, Neurons: 256, Latent: 64 |
| optSAE + HSAPSO | HSAPSO | 870 | 0.018 | LR: [0.0001-0.01], Dropout: [0.1–0.5], Neurons: [64–512], Latent: [16–128] | LR: 0.001, Dropout: 0.3, Neurons: 256, Latent: 64 |
Additionally, Table 7 specifies the full search space explored for each hyperparameter—namely, learning rate ([0.0001–0.01]), dropout rate ([0.1–0.5]), latent dimension ([16–128]), and number of neurons in encoding layers ([64–512])—as well as the final values selected by each optimizer. This separation of concerns ensures that model efficiency and optimizer cost are evaluated independently, addressing the reviewer’s concern about potential misinterpretation of timing data in performance tables.
Hyperparameter optimization is a computationally intensive yet crucial step in training robust and high-performing models. Therefore, through both quantitative results and convergence analysis, we evaluate and contrast the behavior of HSAPSO with alternative methods such as SPSO, SGA, Random Search, and Tree-structured Parzen estimator (TPE). As shown in Table 8, although the proposed HSAPSO-based configuration exhibits the highest offline optimization time (870 s), this overhead is incurred only once, prior to training and inference, and does not affect real-time application. In contrast, the actual deployment of the model—including classification speed and inference time—benefits from highly tuned parameters that reduce complexity and improve stability. What truly sets the HSAPSO apart is not just marginal gains in accuracy (95.8%) or recall (93.5%), but the consistency and efficiency of the learned configuration across evaluation metrics. By systematically and hierarchically navigating the search space, HSAPSO converges on highly generalizable solutions that maintain minimal variance across different runs, a feature particularly critical in biomedical applications where reproducibility is non-negotiable.
Table 8.
Performance and convergence comparison of hyperparameter optimization methods applied to optsae, including offline tuning time and iterations to convergence.
| Configuration | Accuracy (%) | F1-score (%) | Recall (%) | Specificity (%) | Optimization time (s) | Convergence iterations |
|---|---|---|---|---|---|---|
| optSAE + Random Search | 91.2 | 89.5 | 88.0 | 91.5 | 610 | 58 |
| optSAE + TPE | 92.1 | 90.7 | 89.3 | 92.0 | 720 | 47 |
| optSAE + SGA | 92.3 | 90.9 | 89.6 | 93.7 | 770 | 53 |
| optSAE + SPSO | 94.1 | 92.7 | 91.3 | 94.9 | 835 | 33 |
| optSAE + HSAPSO | 95.8 | 94.6 | 93.5 | 96.8 | 870 | 15 |
Moreover, a key differentiator of HSAPSO lies in its superior convergence behavior. As shown in the final column of Table 8, the number of iterations required for convergence in the HSAPSO setup is just 15—the lowest among all compared methods, including advanced strategies such as TPE (47 iterations) and SPSO (33 iterations). This demonstrates HSAPSO’s remarkable ability to exploit the search space efficiently, reducing training cycles and energy consumption during the optimization process. While methods like random search may offer quicker optimization times, they often require significantly more iterations to stabilize and may produce suboptimal configurations. On the other hand, HSAPSO’s adaptive learning coefficients and hierarchical adjustment strategies enable it to converge faster to high-quality configurations, ultimately leading to a better-trained model in fewer steps. This efficiency is not just computational—it translates directly into a more agile modeling pipeline, ideal for iterative experimentation or rapid deployment scenarios.
Besides, the convergence plots provided offer a compelling visualization of the efficiency and adaptability of the proposed Hierarchically Self-Adaptive PSO (HSAPSO) algorithm compared to Standard PSO (SPSO) and the Standard Genetic Algorithm (SGA) (see Fig. 6). In this context, HSAPSO demonstrates a transformative improvement by dynamically adapting its parameters during optimization, enabling faster and more stable convergence across diverse datasets.
Fig. 6.
Convergence analysis of various optimization algorithms (SGA, SPSO, and HSAPSO) combined with the optSAE framework.
In the training data 1 plot, HSAPSO (red line) achieves near-zero error within the first 10–15 iterations, far outperforming SPSO (blue line), which converges at approximately 20 iterations. SGA (green line), while eventually reaching comparable accuracy, lags significantly, requiring over 35 iterations to converge. The ability of HSAPSO to rapidly stabilize reflects its hierarchical adaptability, which balances exploration of the search space with precise exploitation of optimal solutions. This contrasts with SPSO’s fixed parameter design, which, while effective, lacks the flexibility to dynamically adjust to the optimization landscape, and SGA’s reliance on slow mutation and crossover processes, which inhibit its efficiency for complex high-dimensional problems. Similarly, in the training data 2 plot, HSAPSO’s performance remains consistently superior, achieving convergence within 10 iterations, regardless of variations in the dataset. SPSO follows a similar trajectory to training data 1 but is again outpaced by HSAPSO due to its lack of adaptive tuning. SGA continues to exhibit the slowest convergence, reinforcing its inefficiency for problems requiring rapid optimization. The consistent dominance of HSAPSO across datasets highlights its robustness and scalability. By leveraging hierarchical adjustments to key parameters like inertia weight and learning coefficients, HSAPSO ensures optimal convergence trajectories tailored to the data, an advantage that neither SPSO nor SGA can replicate.
These results firmly establish HSAPSO as the most effective optimization method for the proposed SAE framework. The rapid convergence, reduced error rates, and remarkable stability across diverse datasets position HSAPSO as a state-of-the-art solution in optimization tasks, particularly in pharmaceutical applications where computational efficiency and accuracy are critical. Standard approaches like SPSO and SGA, while valuable benchmarks, lack the adaptive flexibility and hierarchical learning mechanisms that drive HSAPSO’s success. This method not only surpasses the baseline in performance but also sets a new benchmark for optimizing deep learning models in high-stakes domains like drug discovery.
It is important to note that the convergence curves in Fig. 6 were generated using the complete training dataset, which includes both druggable and non-druggable samples. This setup allows a balanced evaluation of optimizer behavior under realistic conditions, without filtering based on class labels, thereby reflecting general convergence trends across all sample types.
To further investigate the impact of optimizer selection, we compared SGD and AdamW in terms of both convergence speed and final model accuracy. While AdamW demonstrated faster initial convergence, its final classification accuracy remained comparable to that of SGD, with a marginal improvement of approximately 0.4–0.8% across different experimental scenarios. However, this performance gain came at the cost of increased per-epoch computation time, where AdamW exhibited an average 30–40% increase in training time per epoch due to the additional weight decay correction. Moreover, prior research suggests that adaptive optimizers like Adam and AdamW can lead to suboptimal generalization in certain deep models, as they may cause overfitting to the training distribution. Given that our primary objective was to achieve robust generalization with computational efficiency, SGD remained the preferred choice as it provided a balanced trade-off between model performance and training efficiency. Thus, while AdamW slightly improved convergence speed, the additional computational overhead and the risk of overfitting made it a less practical choice for this particular task.
State-of-the-art comparison
The proposed optSAE + HSAPSO framework demonstrates clear superiority over existing state-of-the-art classification methods across key evaluation metrics, including accuracy, computational complexity, and stability (see Table 9). To ensure fair and unbiased comparisons, all methods—both the proposed and baseline models—were implemented and tested under identical experimental settings. This consistent setup highlights the practical advantages of our approach. Notably, with an accuracy of 95.52%, our model surpasses leading methods such as Chen et al.25 (95.00%) and Sikander et al.26 (94.86%), demonstrating its exceptional capability in distinguishing druggable from non-druggable targets with high precision. This high accuracy is achieved despite handling a significantly larger feature set (2543 features from DrugBank and Swiss-Prot) compared to methods like Jamali et al.18 (443 features) and Lin et al.24 (143 features). The ability to process such a vast feature space reflects the robustness and scalability of the proposed method, making it ideal for complex pharmaceutical datasets.
Table 9.
Comparative analysis of various state-of-the-art methods for drug classification, including their accuracy, computational complexity (seconds per sample), and stability (standard deviation).
| Author | Method | Features | Accuracy (%) | Computational complexity (s/sample) | Stability (Std. Dev.) |
|---|---|---|---|---|---|
| Jamali et al.18 | SVM classification | 443 | 89.78 | 0.015 | ± 0.012 |
| Lin et al.24 | SVM classification | 143 | 93.78 | 0.013 | ± 0.010 |
| Chen et al.25 | XGBoost classification | 155 | 94.64 | 0.022 | ± 0.008 |
| Sikander et al.26 | XGB-DrugPred classification | Large Features | 94.86 | 0.025 | ± 0.006 |
| Chen et al.38 | Deep graph convolutional networks classification | Large Features | 95.00 | 0.028 | ± 0.005 |
| Proposed method | optSAE + HSAPSO classification | 2543 (DrugBank + Swiss-Prot) | 95.52 | 0.010 | ± 0.003 |
The computational complexity of the optSAE + HSAPSO method is the lowest among the compared approaches, requiring only 0.010 s per sample, which is a significant improvement over methods like Chen et al.38 (0.028 s/sample) and Sikander et al.26 (0.025 s/sample). This efficiency stems from the hierarchical self-adaptive parameter tuning offered by HSAPSO, which accelerates convergence and ensures optimal feature extraction. While simpler methods like Jamali et al.18 (0.015 s/sample) and Lin et al.24 (0.013 s/sample) are slightly faster, they compromise accuracy and feature richness, making them less suitable for real-world applications.
Table 9 extends the comparative evaluation by benchmarking the proposed framework against state-of-the-art methods. The computational complexity, training time, convergence iterations, and stability metrics follow the same calculation methodology as detailed in Table 6, maintaining consistency across evaluations. This comparative analysis highlights the efficiency and robustness of our approach relative to existing classification methods.
he stability of the proposed framework is another key differentiator, with a standard deviation of only ± 0.003, reflecting its consistent performance across runs and datasets. Compared to Chen et al.25 (± 0.005) and Sikander et al.26 (± 0.006), which are relatively stable, the optSAE + HSAPSO demonstrates exceptional reliability. This stability ensures resilience against variability in data and parameters, a critical factor in pharmaceutical classification tasks. By combining the feature extraction strength of optSAE with the adaptive optimization capabilities of HSAPSO, the proposed method achieves a harmonious balance of accuracy, efficiency, and robustness, positioning itself as the most effective solution for drug discovery and classification.
Discussion and interpretation
The findings of this study demonstrate the significant advantages of the proposed optSAE + HSAPSO framework over existing state-of-the-art methods in drug classification. With its superior accuracy (95.52%), computational efficiency (0.010 s per sample), and high stability (± 0.003), the method establishes itself as a robust and scalable solution for handling complex pharmaceutical datasets. Compared to conventional approaches like SVM18,19, XGBoost19,25,28,31, and Deep Graph Convolutional Networks38, the proposed framework effectively addresses challenges such as overfitting, inefficiency, and poor generalization to unseen data, showcasing exceptional performance across validation and testing scenarios.
One of the key strengths of the optSAE + HSAPSO method lies in its adaptability and scalability. Unlike traditional classifiers that struggle with smaller feature spaces or high computational demands, this framework excels in processing large, complex feature sets extracted from DrugBank and Swiss-Prot databases. The hierarchical self-adaptive optimization of HSAPSO dynamically tunes parameters, ensuring faster convergence and better alignment with the dataset’s characteristics. This capability enables the framework to outperform competitors in critical metrics such as convergence speed and classification accuracy, as evidenced by the ROC and convergence plots.
The practical ramifications of this research are considerable, particularly within the domain of large-scale drug discovery, where the imperative to minimize computational complexity without compromising accuracy is paramount. We rigorously evaluated computational efficiency and stability by deploying both our proposed methodology and established comparative methods under strictly controlled, identical conditions. A pivotal aspect of this evaluation involved a detailed analysis of optimizer selection, specifically a direct comparison of SGD and AdamW, focusing on convergence velocity and final model accuracy. While AdamW demonstrated a more rapid initial convergence, its terminal classification accuracy offered only a marginal improvement over SGD, on the order of 0.4–0.8%, across a spectrum of experimental settings. This modest gain, however, was accompanied by a substantial increase in per-epoch training time, averaging 10–15% due to the requisite weight decay adjustments. Furthermore, existing literature suggests that adaptive optimizers, such as Adam and AdamW, may potentially hinder the generalization capabilities of deep models by promoting overfitting to the training distribution. Given our explicit objective of achieving a judicious balance between model generalization and computational efficiency, SGD was selected as the preferred optimization algorithm, providing a robust trade-off between accuracy and training duration. While AdamW expedited initial convergence, its elevated computational overhead and potential for overfitting rendered it suboptimal for our specific application. The inherent scalability of our approach facilitates seamless adaptation to expanding datasets, thereby enhancing its applicability to predictive drug discovery, target identification, and disease classification. Finally, the demonstrated robustness of our methodology, evidenced by minimal performance variability, reinforces its reliability for real-world biomedical applications.
Despite its strengths, the optSAE + HSAPSO framework is not without limitations. The dependency on high-quality feature extraction and the need for substantial training data to achieve optimal performance remain challenges. The computational complexity, while lower than other state-of-the-art methods, may still pose issues when applied to extremely high-dimensional datasets or resource-limited environments. Additionally, the framework requires fine-tuning of the optSAE structure and parameter optimization, which, while automated, demands careful implementation and testing. These limitations highlight areas for further improvement and refinement of the method.
Looking ahead, future research could focus on extending the framework to other bioinformatics applications, such as cancer diagnostics or genetic data analysis. Exploring alternative optimization techniques and enhancing the interpretability of the deep learning model could further improve its effectiveness. Additionally, integrating the method with real-time drug screening pipelines could open new avenues for accelerating the drug discovery process. The optSAE + HSAPSO framework lays a solid foundation for scalable, efficient, and reliable pharmaceutical classification, setting the stage for transformative advancements in the field.
Conclusion
This research introduced the optSAE + HSAPSO framework, an innovative integration of stacked autoencoders and hierarchically self-adaptive PSO, designed to address the challenges of drug classification. The framework demonstrated superior performance across key metrics, achieving an impressive accuracy of 95.52%, with unparalleled computational efficiency (0.010 s per sample) and remarkable stability (± 0.003). By leveraging the adaptive capabilities of HSAPSO, the method overcomes traditional challenges such as overfitting and inefficiencies in high-dimensional feature spaces. Its ability to process extensive datasets from DrugBank and Swiss-Prot highlights its scalability and robustness. These results position the optSAE + HSAPSO framework as a cutting-edge tool for pharmaceutical classification tasks, providing a significant advancement over state-of-the-art methods in bioinformatics and drug discovery.
The broader implications of this framework extend beyond its immediate application in drug classification. Its efficiency and reliability make it a compelling choice for integration into high-throughput pharmaceutical pipelines, enabling faster and more accurate identification of druggable targets. By significantly reducing computational overhead while maintaining exceptional precision, the framework paves the way for scalable solutions in drug screening and predictive modeling. Despite these strengths, the framework is not without limitations. It relies on high-quality feature extraction and sufficient training data to achieve optimal performance, which could pose challenges in resource-constrained environments. Additionally, further efforts are needed to enhance the interpretability of the framework, ensuring its broader adoption in clinical and industrial settings.
Future research could build on these findings by applying the framework to other domains such as disease diagnostics or genetic data analysis. Exploring alternative optimization algorithms or hybrid approaches could further improve its scalability and robustness. Moreover, integrating the framework into real-time systems for drug screening and discovery offers exciting possibilities for accelerating pharmaceutical research. The optSAE + HSAPSO framework represents a transformative advancement in drug classification, offering a balance of accuracy, efficiency, and adaptability that sets a new benchmark in the field. This work provides a solid foundation for future innovations in pharmaceutical informatics, demonstrating its potential to reshape how drugs are discovered and developed.
Author contributions
S.S.M.: Conceptualization, methodology, data curation, writing-original draft, validation. K.R.: Conceptualization, supervision, methodology, writing-review & editing, funding acquisition, project administration. M.A.: Software, formal analysis, visualization, investigation. H.E.: Data curation, resources, validation, writing-review & editing. S.S.: Investigation, validation, writing-review & editing. M.H.A.: Methodology, formal analysis, validation, visualization.
Funding
No funding was received.
Data availability
The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request. Protein data were collected from the DrugBank (https://www.drugbank.ca) and Swiss-Prot (https://www.uniprot.org/uniprotkb?query=reviewed:true) databases, which are widely recognized repositories for drug-target and protein sequence information.
Declarations
Competing interests
The authors declare no competing interests.
AI-Assisted writing declaration
During the preparation of this manuscript, the authors utilized AI-based language editing tools solely for grammatical correction and language refinement. No AI-generated content or AI-assisted writing tools were used in the conceptualization, research development, data analysis, or manuscript composition. The authors thoroughly reviewed and manually refined the text to ensure scientific accuracy, originality, and adherence to academic standards. The authors take full responsibility for the final content of this publication.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Khosro Rezaee, Email: kh.rezaee@meybod.ac.ir.
Mojtaba Ansari, Email: ansari@meybod.ac.ir.
References
- 1.Mouchlis, V. D. et al. Advances in de Novo drug design: from conventional to machine learning methods. Int. J. Mol. Sci.22 (4), 1676 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Masoomkhah, S. S., Rezaee, K., Ansari, M. & Eslami, H. Deep learning in drug design—progress, methods, and challenges. Front. Biomed. Technol.11 (3), 492–508 (2024). [Google Scholar]
- 3.Yin, S., Mi, X. & Shukla, D. Leveraging machine learning models for peptide–protein interaction prediction. RSC Chem. Biol.5 (5), 401–417 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Kumar, S. A. et al. Machine learning and deep learning in data-driven decision making of drug discovery and challenges in high-quality data acquisition in the pharmaceutical industry. Future Med. Chem.14 (4), 245–270 (2022). [DOI] [PubMed] [Google Scholar]
- 5.Horton, J. T., Allen, A. E., Dodda, L. S., Cole, D. J. & QUBEKit Automating the derivation of force field parameters from quantum mechanics. J. Chem. Inf. Model.59 (4), 1366–1381 (2019). [DOI] [PubMed] [Google Scholar]
- 6.Kumari, N. & Hasija, Y. CADD: exploring the digital frontier in drug designing. In 3rd Int. Conf. Comput. Model. Simul. Optim. (ICCMSO), 272–277 (2024).
- 7.Selvaraj, C., Chandra, I. & Singh, S. K. Artificial intelligence and machine learning approaches for drug design: challenges and opportunities for the pharmaceutical industries. Mol. Divers. 1–21 (2021). [DOI] [PMC free article] [PubMed]
- 8.Ghislat, G., Hernandez-Hernandez, S., Piyawajanusorn, C. & Ballester, P. J. Data-centric challenges with the application and adoption of artificial intelligence for drug discovery. Expert Opin. Drug Discov. 19 (11), 1297–1307 (2024). [DOI] [PubMed] [Google Scholar]
- 9.Mak, K. K. & Pichika, M. R. Artificial intelligence in drug development: present status and prospects. Drug Discov. Today. 24 (3), 773–780 (2019). [DOI] [PubMed] [Google Scholar]
- 10.Ahuja, A. S. The impact of artificial intelligence in medicine on the future role of the physician. PeerJ7, e7702 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Rao, G. N. et al. AI-driven drug discovery: computational methods and applications. In 5th Int. Conf. Recent. Trends Comput. Sci. Technol. (ICRTCST), 46–50. (2024).
- 12.Davenport, T. & Kalakota, R. The potential for artificial intelligence in healthcare. Future Healthc. J.6 (2), 94–98 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Peña-Guerrero, J., Nguewa, P. A. & García‐Sosa, A. T. Machine learning, artificial intelligence, and data science breaking into drug design and neglected diseases. Wiley Interdiscip. Rev. Comput. Mol. Sci.11 (5), e1513 (2021). [Google Scholar]
- 14.Sulaiman, A. T. et al. A particle swarm and smell agent-based hybrid algorithm for enhanced optimization. Algorithms17 (2), 53 (2024). [Google Scholar]
- 15.Sharma, M., Bhatia, A., Akhil, Dutta, A. K. & Alsubai, S. Optimizing hybrid deep learning models for drug-target interaction prediction: A comparative analysis of evolutionary algorithms. Expert Syst.42 (2), e13683 (2025). [Google Scholar]
- 16.Yang, Y. & Cheng, F. Artificial intelligence streamlines scientific discovery of drug–target interactions. Br. J. Pharmacol. (2025). [DOI] [PubMed]
- 17.Jiménez-Luna, J., Grisoni, F., Weskamp, N. & Schneider, G. Artificial intelligence in drug discovery: recent advances and future perspectives. Expert Opin. Drug Discov. 16 (9), 949–959 (2021). [DOI] [PubMed] [Google Scholar]
- 18.Jamali, A. et al. DrugMiner: comparative analysis of machine learning algorithms for prediction of potential druggable proteins. Drug Discov. Today. 21 (5), 718–724 (2016). [DOI] [PubMed] [Google Scholar]
- 19.Jiang, D. et al. ADMET evaluation in drug discovery. 20. Prediction of breast cancer resistance protein Inhibition through machine learning. J. Cheminform. 12, 21 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Fralish, Z., Skaluba, P. & Reker, D. Leveraging bounded datapoints to classify molecular potency improvements. RSC Med. Chem.15, 2474–2482 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.You, J., McLeod, R. & Hu, P. Predicting drug-target interaction network using deep learning model. Comput. Biol. Chem.80, 90–101 (2019). [DOI] [PubMed] [Google Scholar]
- 22.Stepniewska-Dziubinska, M. M., Zielenkiewicz, P. & Siedlecki, P. Improving detection of protein–ligand binding sites with 3D segmentation. Sci. Rep.9, 18123 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Monteiro, N. R. C., Ribeiro, B. & Arrais, J. P. Drug-target interaction prediction: End-to-end deep learning approach. IEEE/ACM Trans. Comput. Biol. Bioinform. 18 (6), 2364–2374 (2020). [DOI] [PubMed] [Google Scholar]
- 24.Lin, J. et al. Accurate prediction of potential druggable proteins based on genetic algorithm and Bagging-SVM ensemble classifier. Artif. Intell. Med.98, 35–47 (2019). [DOI] [PubMed] [Google Scholar]
- 25.Chen, C. et al. Improving protein–protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier. Comput. Biol. Med.123, 103899 (2020). [DOI] [PubMed] [Google Scholar]
- 26.Sikander, R., Ghulam, A. & Ali, F. XGB-DrugPred: computational prediction of druggable proteins using eXtreme gradient boosting and optimized feature set. Sci. Rep.12 (1), 5505 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Zhang, M., Wan, F. & Liu, T. DrugFinder: Druggable protein identification model based on pre-trained models and evolutionary information. Algorithms. 16 (6), 263 (2023). [Google Scholar]
- 28.Wu, Z. et al. ADMET evaluation in drug discovery. 19. Reliable prediction of human cytochrome P450 Inhibition using artificial intelligence approaches. J. Chem. Inf. Model.59 (12), 5126–5134 (2019). [DOI] [PubMed] [Google Scholar]
- 29.Song, J. et al. Prediction of protein-ATP binding residues based on ensemble of deep convolutional neural networks and LightGBM algorithm. Int. J. Mol. Sci.22 (3), 1218 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Wang, Y. et al. Capsule networks showed excellent performance in the classification of hERG blockers/nonblockers. Front. Pharmacol.10, 1666 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Mustapha, I. B. & Saeed, F. Bioactive molecule prediction using extreme gradient boosting. Molecules21, 2593–2607 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Kundu, D. et al. Application of quantum tensor networks for protein classification. In Proc. Great Lakes Symp. VLSI, 132–137 (2024).
- 33.Toussi, C. A. & Haddadnia, J. Improving protein secondary structure prediction: the evolutionary optimized classification algorithms. Struct. Chem.30 (4), 1335–1344 (2019). [Google Scholar]
- 34.Wishart, D. S. et al. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res.36 (Suppl 1), D901–D906 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.UniProt Swiss-Prot: The manually annotated and reviewed protein sequence database. UniProt; 2023 [cited 2023]. https://www.uniprot.org/
- 36.Nguyen, R., Sokhansanj, B. A., Polikar, R. & Rosen, G. L. Complet+: a computationally scalable method to improve completeness of large-scale protein sequence clustering. PeerJ11, e14779 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Wei, Z. G. et al. Comparison of methods for biological sequence clustering. IEEE/ACM Trans. Comput. Biol. Bioinf.20 (5), 2874–2888 (2023). [DOI] [PubMed] [Google Scholar]
- 38.Chen, J. et al. A sequence-based transformer protein Language model to identify potentially druggable protein targets. Protein Sci.32 (2), e4555 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request. Protein data were collected from the DrugBank (https://www.drugbank.ca) and Swiss-Prot (https://www.uniprot.org/uniprotkb?query=reviewed:true) databases, which are widely recognized repositories for drug-target and protein sequence information.






























