Abstract
Drug-Target Interaction (DTI) prediction is a vital task in drug discovery, yet it faces significant challenges such as data imbalance and the complexity of biochemical representations. This study makes several contributions to address these issues, introducing a novel hybrid framework that combines advanced machine learning (ML) and deep learning (DL) techniques. The framework leverages comprehensive feature engineering, utilizing MACCS keys to extract structural drug features and amino acid/dipeptide compositions to represent target biomolecular properties. This dual feature extraction method enables a deeper understanding of chemical and biological interactions, enhancing predictive accuracy. To address data imbalance, Generative Adversarial Networks (GANs) are employed to create synthetic data for the minority class, effectively reducing false negatives and improving the sensitivity of the predictive model. The Random Forest Classifier (RFC) is utilized to make precise DTI predictions, optimized for handling high-dimensional data. The proposed framework’s scalability and robustness were validated across diverse datasets, including BindingDB-Kd, BindingDB-Ki, and BindingDB-IC50. For the BindingDB-Kd dataset, the GAN+RFC model achieved remarkable performance metrics: accuracy of 97.46%, precision of 97.49%, sensitivity of 97.46%, specificity of 98.82%, F1-score of 97.46%, and ROC-AUC of 99.42%. Similarly, for the BindingDB-Ki dataset, the model attained an accuracy of 91.69%, precision of 91.74%, sensitivity of 91.69%, specificity of 93.40%, F1-score of 91.69%, and ROC-AUC of 97.32%. On the BindingDB-IC50 dataset, the model achieved an accuracy of 95.40%, precision of 95.41%, sensitivity of 95.40%, specificity of 96.42%, F1-score of 95.39%, and ROC-AUC of 98.97%. These results demonstrate the efficacy of the GAN-based approach in capturing complex patterns, significantly improving DTI prediction outcomes. In conclusion, the proposed GAN-based hybrid framework sets a new benchmark in computational drug discovery by addressing critical challenges in DTI prediction. Its robust performance, scalability, and generalizability contribute substantially to therapeutic development and pharmaceutical research.
Keywords: Drug-Target interaction, Generative adversarial networks, Machine learning, Random forest classifier, Data imbalance, Computational drug discovery
Subject terms: Drug discovery, Computer science
Introduction
The discovery of new drugs is a crucial and time-consuming process in modern medicine, with pharmaceutical companies continuously striving to identify effective treatments for various diseases1,2. According to recent statistics, the global pharmaceutical market is expected to reach a value of $1.5 trillion by 2025, driven by an increasing demand for new and innovative therapies3. However, despite these advancements, the process of drug development is still plagued by high costs and long timelines, with many potential drug candidates failing during the clinical trial phases. A key component of successful drug discovery is understanding the interactions between drugs and their target proteins, which can significantly influence the efficacy and safety of therapeutic agents4.
Drug-Target Interaction (DTI) prediction is a critical aspect of drug discovery, as it helps in identifying potential drug candidates that can interact with target proteins to exert their desired therapeutic effects5. Recent statistics show that approximately 60-70% of drug candidates fail due to poor efficacy or adverse effects, highlighting the importance of accurate DTI prediction6. Traditional experimental methods for predicting DTIs are costly, time-consuming, and labor-intensive. These challenges, along with the sheer complexity of biochemical systems, have led researchers to explore computational approaches that can predict DTIs more efficiently. However, the complexity of these systems, including the vast diversity of drug molecules and target proteins, still poses significant challenges for computational models.
Recent advances in Artificial Intelligence (AI) and Machine Learning (ML)7 models have shown great promise in improving the accuracy and efficiency of DTI predictions8. Traditional DTI prediction models, such as those based on molecular docking or ligand-based methods, often struggle to handle the high-dimensional and noisy data inherent in biological systems. In contrast, ML models have demonstrated superior performance by learning complex patterns from large datasets. These methods can address issues such as data imbalance, which is a common problem in DTI datasets where the number of non-interacting pairs far outweighs the interacting ones. With these advancements, ML-based models offer a scalable and robust solution for accelerating drug discovery, enabling researchers to make more informed decisions in the early stages of drug development.
Wei et al.9 proposed DeepLPI, a deep learning (DL)-based model combining a ResNet-based 1D CNN and a bi-directional LSTM (biLSTM) to predict protein-ligand interactions. Raw drug molecular and target protein sequences were encoded into dense vector representations and processed through two ResNet-based 1D CNN modules to extract features. These features were concatenated and passed through the biLSTM network, followed by an MLP module for final prediction. The model was trained and tested on the BindingDB and Davis datasets. In the BindingDB dataset, DeepLPI achieved an AUC-ROC of 0.893, sensitivity of 0.831, and specificity of 0.792 on the training set, and an AUC-ROC of 0.790, sensitivity of 0.684, and specificity of 0.773 on the test set. When compared to baseline methods like DeepCDA and DeepDTA, DeepLPI outperformed them, demonstrating high accuracy and robust generalization capability. These results indicate that DeepLPI has the potential to identify new drug-target interactions and improve drug discovery. Pei et al.10 proposed a label aggregation method with pair-wise retrieval and a representation aggregation method with point-wise retrieval of nearest neighbors, which efficiently boosted DTA prediction performance during the inference phase without any training cost. Additionally, an extension called Ada-kNN-DTA was introduced, featuring instance-wise and adaptive aggregation with lightweight learning. Results from four benchmark datasets demonstrated that kNN-DTA significantly outperformed previous state-of-the-art (SOTA) methods. On the BindingDB IC50 and Ki testbeds, kNN-DTA achieved new records of RMSE 0.684 and 0.750. The Ada-kNN-DTA extension further improved performance, reaching RMSE values of 0.675 and 0.735. These results confirmed the effectiveness of the proposed methods. Further analyses and results across different settings highlighted the great potential of the kNN-DTA approach, establishing it as a powerful tool for enhancing DTA prediction with minimal computational cost. Zhu et al.11 developed MDCT-DTA, a novel model for drug-target affinity (DTA) prediction, which combined multi-scale diffusion and interactive learning. To overcome the limitations of existing approaches, the model incorporated a multi-scale graph diffusion convolution (MGDC) module to effectively capture intricate interactions among drug molecular graph nodes. A CNN-Transformer Network (CTN) block was also introduced to model the interdependencies between amino acids, enhancing the model’s representation and learning capabilities. Additionally, a local inter-layer information interaction structure was designed to analyze relationships between drug and protein features, improving the robustness and representativeness of structural features. The model’s effectiveness was evaluated on BindingDB benchmark dataset. Experimental results showed that MDCT-DTA accurately predicted drug-target binding affinities, achieving an MSE of 0.475 on the BindingDB dataset. These outcomes highlighted the model’s potential, offering new insights into DTA prediction and advancing the development of robust predictive frameworks. Schuh et al.12 developed BarlowDTI, a novel approach for DTI prediction that utilized the Barlow Twins architecture for feature extraction, focusing on the structural properties of target proteins. The model achieved state-of-the-art performance across the BindingDB-kd benchmark using one-dimensional input data, with a ROC-AUC score of 0.9364. By employing a gradient boosting machine as the predictor, it ensured fast, resource-efficient predictions. Analysis of co-crystal structures demonstrated that BarlowDTI effectively identified catalytically active and stabilizing residues, showcasing its generalization capabilities. This innovation enhanced DTI prediction efficiency and accuracy, contributing to drug discovery advancements and a deeper understanding of molecular interactions. Guichaoua et al.13 introduced a novel approach to address two key challenges in DTI prediction: the need for large, high-quality datasets and scalable prediction methods. He developed LCIdb, a curated, extensive DTI dataset with enhanced molecule and protein space coverage, surpassing traditional benchmarks. Additionally, he proposed Komet (Kronecker Optimized METhod), a scalable prediction pipeline using a three-step framework with efficient computations and the Nyström approximation. Komet’s Kronecker interaction module effectively balances expressiveness and computational complexity. Implemented as open-source software with GPU parallelization, Komet achieved superior scalability and performance, with a ROC-AUC of 0.70 on BindingDB, outperforming existing DL methods. Pei et al.14 proposed a novel framework that incorporated three simple yet effective strategies to enhance Drug-Target Affinity (DTA) prediction. First, a multi-task training approach was employed, which jointly learned DTA prediction and masked language modeling (MLM) on paired drug-target datasets. Second, a semi-supervised learning technique was introduced, utilizing large-scale unpaired molecular and protein data to improve representation learning-unlike traditional pre-training methods that only considered either molecules or proteins in isolation. Third, a cross-attention module was integrated to strengthen the interaction between drug and target representations. Extensive experiments conducted on real-world benchmark datasets, such as BindingDB-IC50, demonstrated that the proposed framework significantly outperformed existing models. It achieved state-of-the-art results, including an RMSE of 0.712 on the BindingDB-IC50 dataset, reflecting over a 5% improvement compared to previous best-performing methods.
Problem statements
Despite significant advancements in DTI prediction, several critical challenges remain:
Integration of Chemical and Biological Information Current methodologies struggle to effectively combine chemical fingerprint representations of drugs and biomolecular features of targets, limiting their capacity to capture complex biochemical and structural relationships necessary for accurate DTI prediction.
Data Imbalance in Experimental Datasets Imbalanced datasets, where the minority class of positive drug-target interactions is underrepresented, lead to biased models that exhibit reduced sensitivity and higher rates of false negatives in prediction tasks.
Limitations of Traditional Drug Discovery Approaches Conventional methods for drug discovery are time-consuming, expensive, and lack scalability, making them unsuitable for addressing the increasing complexity and speed required in modern drug development.
Threshold Optimization for Drug Target Interaction Existing methods lack systematic evaluation for selecting optimal thresholds to classify drug-target interactions, leading to inaccuracies. This research addresses this gap by conducting experimental analysis to reliably balance and improve the drug-target interactions.
These gaps highlight the necessity for innovative computational frameworks that address data integration, imbalance, and scalability to accelerate drug discovery and enhance predictive accuracy.
Objectives
The primary objective of this research is to design a hybrid framework that integrates advanced feature engineering, data balancing, and ML techniques for accurate DTI prediction. The study aims to unify drug fingerprints and target compositions into a single feature representation to enhance predictive performance. It also addresses data imbalance by employing Generative Adversarial Networks (GANs) to improve sensitivity and reduce false negatives. Furthermore, the framework is designed to be scalable and robust, ensuring its applicability across diverse datasets and drug discovery scenarios. By achieving these objectives, the research aims to contribute significantly to the development of robust computational tools for accelerating drug discovery processes.
Contributions
This study makes the following significant contributions to the field of DTI prediction:
Develop a Hybrid Model for DTI Prediction: Design and implement a novel hybrid framework that integrates ML and DL techniques to enhance the accuracy and reliability of DTI predictions.
Expressing Chemical and Biological Information Utilizing advanced techniques to extract structural features of drugs through MACCS keys and biomolecular features of targets via amino acid and dipeptide compositions. This approach captures both chemical and biological intricacies, enabling a comprehensive understanding of drug-target interactions and significantly enhancing predictive accuracy in computational drug discovery.
Address Data Imbalance Challenges: Employ Generative Adversarial Networks (GANs) to generate synthetic data for the minority class, mitigating the adverse effects of data imbalance and improving the sensitivity of the prediction model.
Demonstrate Scalability and Robustness: Validate the scalability of the proposed framework across datasets of different sizes and distributions to ensure its applicability in diverse drug discovery scenarios.
These contributions collectively advance the field of computational drug discovery, providing a scalable, accurate, and robust solution for predicting DTIs and accelerating drug development processes.
Research questions
RQ1: How can the proposed ML model improve the prediction accuracy of drug-target interactions compared to traditional ML/DL models?
RQ2: What are the benefits of GAN data balancing technique for DTI prediction?
RQ3: Can the proposed method enhance predictive performance and scalability in drug discovery?
Hypothesis
The proposed hybrid framework, integrating machine learning (ML) and deep learning (DL) techniques, aims to significantly improve drug-target interaction (DTI) prediction by addressing limitations in data representation, imbalance, and scalability. By unifying drug fingerprints (MACCS keys) and biomolecular compositions (amino acid and dipeptide sequences), the model enhances feature representation, leading to higher predictive accuracy. Additionally, the incorporation of Generative Adversarial Networks (GANs) to generate synthetic data for the minority class effectively mitigates data imbalance, improving sensitivity and reducing false negatives in predictions. The framework is designed for scalability and robustness, ensuring adaptability across datasets of varying sizes and distributions, making it highly applicable to diverse drug discovery scenarios. By systematically validating these aspects, the proposed approach is expected to outperform traditional ML/DL models in predictive performance, contributing to the development of scalable, data-driven drug discovery solutions.
The remaining sections of this paper are structured as follows: Sect. “Materials and Method” provides a detailed explanation of our research materials and methods used in our experiment. The results, including the environment, performance metrics, and evaluation, are outlined in Sect. “Performance Analysis”. Section “Discussion” presents the comparison analysis with existing works and research questions and hypothesis validation. Lastly, Sect. “Conclusion” presents the conclusion and future work.
Materials and method
This study proposes a novel methodology that integrates robust feature extraction, data balancing using GANs, and advanced ML and DL models to improve the accuracy and scalability of DTI prediction. The methodology (Fig. 1) begins with dataset preparation using the BindingDB datasets obtained from the TDC library. Binding affinities are binarized with a threshold of 10, 20 and 30 nM, categorizing interactions into positive (binding) and negative (non-binding). Drug compounds are represented using MACCS fingerprints generated via the RDKit library, while protein targets are encoded through amino acid composition (ACC) and dipeptide composition (DC), normalized by sequence length. The MACCSKeys enhance feature extraction by capturing intricate relationships between drug and target features. Invalid rows resulting from failed fingerprint generation or sequence analysis are excluded to ensure data integrity. The resulting feature matrix is standardized using StandardScaler to facilitate model training. To address class imbalance, a Generative Adversarial Network (GAN) is employed. The GAN architecture includes a generator, which synthesizes realistic feature vectors for the minority class, and a discriminator, which differentiates between real and synthetic data. This iterative process generates high-quality synthetic samples, effectively balancing the dataset. The combined real and synthetic dataset is split into training (80%) and testing (20%) subsets using train_test_split. The study evaluates several ML and DL models. Traditional ML models, including Decision Tree Classifier (DTC), Random Forest Classifier (RFC), and Multilayer Perceptron (MLP), are compared with advanced DL models such as a Fully Connected Neural Network (FCNN), Multi-Head Attention-integrated FCNN (MHA-FCNN), DeepLPI, BarlowDTI and Komet. Model performance is assessed using metrics such as accuracy, precision, recall, F1 score, and ROC-AUC. Results demonstrate that the GAN+RFC outperforms traditional models, achieving superior predictive accuracy and scalability. This comprehensive methodology highlights the importance of robust feature engineering and data balancing in advancing computational drug discovery.
Fig. 1.
The proposed drug-target interaction prediction architecture.
Dataset collection
In this study, we utilized two datasets from the BindingDB resource: BindingDB_Kd and BindingDB_Ki. These datasets are critical for predicting drug-target interactions (DTIs), specifically focusing on the binding affinities between drug-like small molecules and protein targets. BindingDB15 is a public, web-accessible database that contains experimentally measured binding affinities, primarily for interactions between proteins and drug-like small molecules. It serves as a valuable resource for computational drug discovery, facilitating the analysis and prediction of DTIs. Each record in the database includes detailed information on the target proteins, the interacting compounds, and their respective binding affinities. The prediction task for both datasets is framed as a regression problem that we have converted into binary by using various such as 10, 20 and 30 as a threshold.
We have the amino acid sequence of the target protein and the SMILES (Simplified Molecular Input Line Entry System) string representation of the compound, the goal is to predict the binding affinity between the two entities. Binding affinity is measured in terms of
(dissociation constant) or
(inhibition constant) or
half-maximal inhibitory concentration, which quantifies the strength of the interaction.
Dataset statistics:
The statistics of the datasets (Table 1) used in our experiments are as follows:
BindingDB_Kd: Contains 52,284 DTI pairs, 10,665 unique drugs, and 1,413 unique proteins.
BindingDB_Ki: Contains 375,032 DTI pairs, 174,662 unique drugs, and 3,070 unique proteins.
BindingDB_IC50: Contains 991,486 DTI pairs, 549,205 unique drugs, and 5,078 unique proteins.
Table 1.
Statistics of BindingDB datasets.
| Dataset | DTI Pairs | Unique drugs | Unique proteins |
|---|---|---|---|
| BindingDB_Kd | 52,284 | 10,665 | 1,413 |
| BindingDB_Ki | 375,032 | 174,662 | 3,070 |
| BindingDB_IC50 | 991,486 | 549,205 | 5,078 |
Relevance to the study:
These datasets provide a diverse and comprehensive set of interactions, making them ideal for training and evaluating ML models aimed at predicting DTIs. The large number of data points in BindingDB_Ki enables robust training of models, while the smaller BindingDB_Kd dataset offers an opportunity to evaluate model performance on datasets with more limited samples. By using these datasets, our study aims to enhance the predictive accuracy of binding affinity while addressing challenges such as data imbalance and feature representation.
Data preprocessing
To ensure consistency and enhance the quality of the data for model training, several preprocessing steps were applied. These steps included affinity harmonization and binarization, which transformed the regression task into a classification task.
Affinity harmonization:
The binding affinities in the datasets often originate from diverse experimental conditions, leading to variations in their reported values. To address this inconsistency, we harmonized the affinities using the mean mode:
![]() |
The data.harmonize_affinities function works as follows:
Input: A dataset containing binding affinity values under various experimental conditions.
Process: The function calculates the mean affinity for each set of measurements and adjusts all affinity values to align with this mean. This reduces variance caused by different conditions and produces a standardized dataset.
Output: A dataset with harmonized affinity values, suitable for use in predictive modeling.
By harmonizing the affinities, we minimize noise and ensure that the data is consistent across all samples.
Affinity binarization:
The binding affinity values were binarized to convert the continuous affinity data into binary interaction labels. This step is crucial for the drug-target interaction (DTI) task, where the goal is to distinguish between strong and weak interactions. A threshold (
) was applied to categorize the affinities:
Affinity values below
(e.g.,
): Indicate a strong interaction (label = 1).Affinity values equal to or above
(
): Indicate no significant interaction (label = 0).
The binarization was performed using the following code:
![]() |
The data.binarize function is implemented as follows:
Input: A dataset of continuous binding affinity values and a threshold (
).- Parameters:
- threshold: The value(s) used to determine interaction strength.
- order: Specifies the relationship between values and interaction strength. Here, ’descending’ indicates that lower affinity values correspond to stronger interactions.
- Process:
- The function iterates through the dataset and compares each affinity value to the specified threshold.
- If the affinity value is below the threshold, it assigns a binary label of 1 (interaction).
- If the value is equal to or above the threshold, it assigns a binary label of 0 (no interaction).
Output: A dataset with binary interaction labels for each binding affinity value.
Specifically:
Thresholds: 10, 20, 30.
- Binary Labels:
- A value of 1 indicates an interaction (
). - A value of 0 indicates no interaction (
).
The threshold value (Th=10) used in the classification task is typically measured in kilocalories per mole (kcal/mol), which is a standard unit for binding affinity. This unit represents the energy of interaction between molecules, such as a drug and its target. With the threshold set to 10 and the order specified as “descending,” binding affinity values below 10 kcal/mol are classified as positive interactions (label = 1), indicating strong interactions between molecules. Conversely, values equal to or above 10 kcal/mol are labeled as negatives (label = 0), signifying no significant interaction. The positive-to-negative ratio in the dataset is determined by counting the number of data points classified as positive (strong interactions) and negative (no interaction) based on this threshold. The positive-to-negative ratio of the dataset for (Th = 10, 20, 30) is shown in Table 2. The table provides the count of data points classified as positive (1) and negative (0) for the BindingDB_Kd, BindingDB_Ki, and BindingDB_IC50 datasets, illustrating the distribution of interactions based on the threshold.
Table 2.
Positive-to-negative interactions of DTI datasets.
| Dataset | Threshold | Interaction | |
|---|---|---|---|
| Class 0 / Negative | Class 1 / Positive | ||
| BindingDB_Kd | 10 | 38910 | 3326 |
| 20 | 37915 | 4321 | |
| 30 | 37385 | 4851 | |
| BindingDB_Ki | 10 | 227658 | 69027 |
| 20 | 208998 | 87687 | |
| 30 | 197572 | 99113 | |
| BindingDB_IC50 | 10 | 664750 | 102154 |
| 20 | 621782 | 145122 | |
| 30 | 597037 | 169867 | |
Significance of preprocessing:
These preprocessing steps are crucial to mitigating data inconsistencies and aligning the data format with the requirements of binary classification tasks. By harmonizing and binarizing the data, the preprocessing pipeline enhances the interpretability and predictive accuracy of the models, especially in tasks involving datasets with varying scales of binding affinity values.
Feature engineering
Feature engineering is a critical step in the development of ML models, transforming raw data into meaningful representations to enhance predictive performance. In this study, feature engineering was applied to extract molecular fingerprints for drugs and amino acid compositions for protein sequences, followed by data scaling and feature integration.
Drug fingerprint extraction:
For drug molecules represented by SMILES strings, we utilized MACCS (Molecular ACCess System) keys as descriptors. These are binary fingerprints capturing the presence or absence of specific substructures in the molecular structure. The RDKit library was employed to compute MACCS keys for each drug molecule:
![]() |
To handle invalid or non-parsable SMILES strings, rows with missing fingerprints were dropped. Each MACCS key was then converted into a feature vector, representing the structural properties of the drug molecules.
Protein sequence representation
Protein sequences are vital for understanding biological interactions, as their structure and composition play a significant role in determining biochemical behavior. To encode this information effectively, computational techniques are employed to extract meaningful representations. Two prominent methods, amino acid composition (AAC) and dipeptide composition (DC), are utilized to capture both individual amino acid frequencies and sequential dependencies within the sequences. These representations ensure that protein data is structured appropriately for machine learning models, aiding in predictive tasks such as drug-target interaction (DTI) modeling.
Amino acid composition:
Amino acid composition (AAC) quantifies the normalized frequency of each of the 20 standard amino acids in a protein sequence. This representation provides a high-level overview of the sequence’s biochemical properties. Mathematically, AAC for a given amino acid is computed using the formula:
![]() |
This method ensures uniform feature scaling and enables seamless integration into predictive models. The Python Counter class is employed to efficiently compute amino acid occurrences, followed by normalization with respect to the total sequence length, encoding the protein’s structural characteristics.
Dipeptide composition:
To enrich protein sequence representation further, dipeptide composition (DC) is calculated. Dipeptides represent pairs of consecutive amino acids, capturing intricate sequential dependencies within the sequence. Using Python’s itertools.product, all possible dipeptides are generated from the 20 standard amino acids, yielding 400 unique combinations. The normalized frequency of each dipeptide is then computed using the Counter class, ensuring the representation accounts for sequence length variations. This detailed representation complements AAC by incorporating sequential patterns, enabling machine learning models to better understand the intricacies of protein structures and improving predictive accuracy for DTIs.
Feature integration:
The drug fingerprint features and amino acid composition features were concatenated to form a unified feature matrix
, ensuring each row represents a drug-target pair. The feature matrix
was standardized using ‘StandardScaler‘ to ensure that all features were on the same scale:
![]() |
This preprocessing step was crucial to prevent features with larger scales from dominating the learning process during model training.
Output feature matrix:
The resulting feature matrix
has the following structure:
Drug Fingerprints: Binary features representing the MACCS keys for each drug.
Target Compositions: Normalized amino acid compositions for each protein sequence.
Concatenated Features: Combined representation of drug-target interactions.
The target variable
consists of binary labels indicating the presence or absence of interactions.
Data validation and shape:
The shape of the feature matrix
was verified to confirm successful feature extraction and integration, ensuring alignment with the dimensions of the target labels. Standardization further optimized the feature matrix for ML tasks.
SMILES (Simplified Molecular Input Line Entry System) strings provide a compact and widely-used representation of chemical structures. To ensure a consistent and interpretable feature representation, we utilized MACCS (Molecular ACCess System) keys as descriptors for the drug molecules. These are binary fingerprints designed to capture the presence or absence of specific substructures in the molecular structure, making them effective for computational modeling. For target proteins, we computed Amino Acid Composition (AAC), which captures the normalized frequencies of the 20 standard amino acids in the sequence, providing a comprehensive overview of the protein’s composition. The drug fingerprint features and protein AAC features were concatenated to form a unified feature matrix, representing each drug-target pair. Our approach was validated using two benchmark datasets, BindingDB-Kd and BindingDB-Ki, where SMILES strings were used to represent drug sequences. This choice allows for scalability and compatibility with the datasets while leveraging the structural information effectively.
Data balancing using generative adversarial networks (GANs)
Class imbalance in datasets is a significant challenge for ML models, particularly in binary classification tasks. In the context of DTI prediction, the imbalance between interacting (minority) and non-interacting (majority) samples can bias the model toward the majority class, leading to suboptimal performance. To address this, we employ a Generative Adversarial Network (GAN) to synthesize realistic samples for the minority class. The GAN comprises two components: a generator and a discriminator. The generator learns to produce synthetic samples that resemble the minority class distribution, while the discriminator learns to differentiate between real and synthetic samples. Through adversarial training, where the generator aims to “fool” the discriminator, the generator progressively improves its ability to produce high-quality synthetic data. This process enhances the representation of the minority class in the dataset, addressing imbalances and reducing bias in model training.
The GAN-based approach (Algorithm 1) begins by training the discriminator with both real samples from the minority class and synthetic samples generated by the generator. Once the discriminator achieves adequate performance, the generator is trained to produce samples that the discriminator cannot distinguish from real data. After sufficient training epochs, the generator generates synthetic samples in quantities sufficient to balance the dataset. These synthetic samples are then combined with the original dataset, ensuring a balanced distribution of classes. This approach not only mitigates the limitations of under-sampling or over-sampling methods but also maintains the diversity of data, leading to improved model performance and robustness.
The following algorithm (Algorithm 1) outlines the steps for generating synthetic samples using a GAN to balance the dataset.
Algorithm 1.
Data Balancing Using GANs
Fig. 2.
Data distribution of BindingDB-Kd dataset.
Fig. 3.
Data distribution of BindingDB-Ki dataset.
Fig. 4.
Data distribution of BindingDB-IC50 dataset.
Machine and deep learning algorithms
The performance of ML and DL algorithms is critical in analyzing complex data. In this study, we employed several algorithms, including Decision Tree Classifier (DTC), Multilayer Perceptron Classifier (MLP), Random Forest Classifier (RFC), Fully Connected Neural Network (FCNN), and Multi-Head Attention Fully Connected Neural Network (MHA-FCNN). Each algorithm was evaluated to assess its suitability for predicting drug-target interactions (DTIs) using protein sequences and SMILES strings.
Decision tree classifier (DTC):
DTC is a simple and interpretable algorithm that partitions data into subsets based on feature values. It creates a tree-like structure where each node represents a decision, and the leaves represent the outcome. While DTC is computationally efficient and easy to implement, it can suffer from overfitting, especially with complex datasets16. In our study, DTC serves as a baseline model to compare with advanced algorithms.
Multilayer perceptron classifier (MLP):
MLP is a feedforward neural network that consists of multiple layers of neurons. Each neuron applies an activation function to a weighted sum of its inputs. MLP can learn non-linear relationships between features and is well-suited for structured datasets17. In this study, MLP helps bridge traditional ML methods with more advanced DL techniques by evaluating its performance on drug-target interaction data.
Random forest classifier (RFC):
RFC is an ensemble learning method that constructs multiple decision trees during training and combines their outputs for final predictions. It is robust to overfitting and effective for handling high-dimensional and complex data18. As the proposed model, RFC leverages its ability to model non-linear interactions between protein sequences and SMILES strings, providing a benchmark for our hybrid approach.
Fully connected neural network (FCNN):
FCNN, also known as a dense neural network, connects every neuron in one layer to every neuron in the next. This architecture allows it to learn complex patterns in data, making it well-suited for raw representations such as protein sequences and SMILES strings. The FCNN used in this study has three hidden layers with dropout for regularization. Its flexibility in learning intricate patterns provides valuable insights into the data.
Multi-head attention fully connected neural network (MHA-FCNN):
MHA-FCNN integrates multi-head attention mechanisms with FCNN layers to enhance the learning process. The attention mechanism focuses on relevant parts of the input, improving the model’s ability to capture meaningful patterns. By combining attention with dense layers, MHA-FCNN achieves better feature representation, particularly for sequences and structured chemical data, making it a powerful tool for DTI prediction. Table 3 shows the Parameter settings for the FCNN and MHA-FCNN models.
Table 3.
Parameter settings for the FCNN and MHA-FCNN models.
| Parameter | FCNN | MHA-FCNN |
|---|---|---|
| Number of Dense Layers | 3 | 3 |
| Hidden Neurons | [128, 64, 32] | [128, 64, 32] |
| Dropout Rate | 0.3 | 0.3 |
| Attention Mechanism | N/A | Multi-Head Attention |
| Optimizer | Adam | Adam |
| Learning Rate | 0.001 | 0.001 |
| Loss Function | Binary Cross Entropy | Binary Cross Entropy |
| Activation Function (Hidden Layers) | ReLU | ReLU |
| Activation Function (Output) | Sigmoid | Sigmoid |
By comparing these algorithms, this study evaluates the balance between simplicity, interpretability, and the ability to extract meaningful features from complex datasets. The complementary strengths of RFC and MHA-FCNN provide robust tools for predicting DTIs, enhancing computational drug discovery methodologies.
Performance analysis
Environment
The experiments were conducted on a system equipped with an HP ProBook 400 GB Notebook PC, featuring an 12th Gen Intel(R) Core(TM) i3-1115G4 processor running at 8.00 GHz. The system is configured with 16.00 GB of RAM. The experiments utilized Python libraries such as TensorFlow, PyTorch, and Scikit-learn for model development and evaluation.
Evaluation metrics
To evaluate the performance of our predictive models for drug-target interaction, we employed a variety of metrics to assess our proposed model for the interaction prediction tasks. These metrics provide a comprehensive understanding of the model’s effectiveness and reliability.
- Accuracy: The proportion of correctly classified instances (both positive and negative) out of the total predictions.

- Sensitivity (Recall): The ability of the model to correctly identify positive instances.

- Specificity: The ability of the model to correctly identify negative instances.

- F1-Score: The harmonic mean of precision and recall, balancing the trade-off between false positives and false negatives.

Receiver Operating Characteristic - Area Under the Curve (ROC-AUC) Score: The ROC-AUC score evaluates a model’s ability to distinguish between classes across various threshold values. It ranges from 0 to 1, where a higher score indicates better performance, with 1 representing perfect classification and 0.5 suggesting random guessing.
- Mean Absolute Error (MAE): The average of the absolute differences between predicted and actual values, quantifying the prediction accuracy.

- Mean Squared Error (MSE): The average of the squared differences between predicted and actual values. This metric penalizes larger errors more significantly, making it highly sensitive to outliers.

- Root Mean Squared Error (RMSE): The square root of the MSE, provides an error measurement in the same units as the predicted values. This metric helps interpret the magnitude of errors more intuitively.

-
Confusion Matrix: A two-dimensional table that presents the counts of true positive (TP), true negative (TN), false positive (FP), and false negative (FN) predictions. It offers a detailed visualization of the model’s performance across binary classification tasks.
Where: -Predicted Positive Negative Actual positive TP FN Actual negative FP TN
: True Positives (correctly predicted interactions). -
: True Negatives (correctly predicted non-interactions). -
: False Positives (incorrectly predicted interactions). -
: False Negatives (incorrectly predicted non-interactions).These metrics collectively evaluate the regression and classification aspects of our models, ensuring robust analysis of their predictive capabilities.
-
Statistical Significance Testing (Friedman Test)
To ensure the reliability of model comparisons, we utilized the Friedman test, a non-parametric statistical test designed to detect differences across multiple models. The Friedman test evaluates whether at least one model outperforms the others significantly. It is commonly applied to rank-based results from multiple experiments.
Where: -
: Friedman test statistic. -
: Number of datasets or observations. -
: Number of models being compared. -
: Sum of ranks assigned to each model across datasets.The null hypothesis states that all models perform equally well. A significant
-value (< 0.05) indicates rejection of the null hypothesis, suggesting differences in model performance.By combining these metrics and statistical tests, we gained comprehensive insights into both the predictive performance and statistical robustness of our approach.
Result analysis
We have presented a comprehensive analysis of the experimental results obtained from evaluating various machine learning (ML) and deep learning (DL) models on benchmark drug-target interaction datasets, including Binding-Kd, Binding-Ki and Binding-IC50. The analysis focuses on key performance metrics such as Accuracy, Precision, Sensitivity, Specificity, F1-score, Kappa, MCC, ROC-AUC, MAE, MSE, and RMSE across different binarized thresholds such as 10,20 and 30. The aim is to assess the predictive capabilities and robustness of traditional classifiers and modern neural architectures in learning complex patterns from biochemical interaction data. By comparing the performance of each model under consistent experimental settings, we highlight the strengths and limitations of the respective approaches, offering insights into their suitability for real-world drug discovery applications.
The BindingDB-Kd dataset was evaluated using a diverse set of traditional machine learning (ML) and deep learning (DL) models across three classification thresholds: 10, 20, and 30 (Table 4). A comprehensive performance comparison was conducted using multiple evaluation metrics including Accuracy, Precision, Sensitivity (Recall), Specificity, F1-score, Cohen’s Kappa, Matthews Correlation Coefficient (MCC), ROC-AUC, Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE). At the 10 threshold, the Random Forest Classifier (RFC) demonstrated the most superior performance among all models, achieving the highest accuracy of 97.46%, along with the best results across most metrics such as Precision (97.49%), Specificity (98.82%), ROC-AUC (99.42), and lowest error rates (MAE and MSE of 2.54). The Komet model also showed competitive performance with an accuracy of 97.40%, followed closely by MHA-FCNN and FCNN, which maintained strong F1-scores and ROC-AUC values above 99.2, indicating the robustness of deep learning architectures in capturing nonlinear patterns. When evaluated at the 20 threshold, a slight decline in performance was observed across all models due to the increased classification difficulty. Nonetheless, RFC continued to dominate with the best accuracy of 96.56%, F1-score of 96.56%, and a leading ROC-AUC of 99.20. Komet and MHA-FCNN remained strong contenders, with accuracies of 96.28% and 96.10%, respectively, and high Specificity values, showcasing their generalizability in slightly noisier settings. At the more relaxed 30 threshold, the trend of gradual performance degradation persisted across the board. However, RFC remained the top performer with an accuracy of 96.27%, Precision of 96.33%, and a ROC-AUC of 99.12, reaffirming its reliability across varied thresholds. DeepLPI and Komet followed closely, achieving accuracies of 96.00% and 95.97%, respectively, with strong ROC-AUC and error metrics. Overall, the RFC consistently outperformed all other models at every threshold, reflecting its strong capability in handling structured biochemical data. Deep learning models such as MHA-FCNN, DeepLPI, and BarlowDTI also exhibited impressive and stable results, particularly in Specificity and ROC-AUC, making them suitable for high-confidence classification in drug-target binding prediction tasks.
Table 4.
Performance analysis of ML and DL models on BindingDB-Kd.
| Dataset | Threshold | Model | Accuracy | Precision | Sensitivity | Specificity | F1score | Kappa | MCC | ROC-AUC | MAE | MSE | RMSE |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BindingDB_Kd | 10 | DTC | 96.47 | 96.47 | 96.47 | 96.45 | 96.47 | 92.93 | 92.93 | 96.66 | 3.53 | 3.53 | 18.80 |
| MLP | 97.13 | 97.14 | 97.13 | 97.91 | 97.13 | 94.26 | 94.27 | 99.15 | 2.87 | 2.87 | 16.95 | ||
| RFC | 97.46 | 97.49 | 97.46 | 98.82 | 97.46 | 94.91 | 94.95 | 99.42 | 2.54 | 2.54 | 15.95 | ||
| FCNN | 97.14 | 97.20 | 97.14 | 98.84 | 97.14 | 94.28 | 94.34 | 99.28 | 2.86 | 2.86 | 16.91 | ||
| MHA-FCNN | 97.21 | 97.25 | 97.21 | 98.70 | 97.21 | 94.42 | 94.46 | 99.29 | 2.79 | 2.79 | 16.70 | ||
| DeepLPI | 97.03 | 97.04 | 97.03 | 97.88 | 97.02 | 94.05 | 94.06 | 99.34 | 2.97 | 2.97 | 17.25 | ||
| BarlowDTI | 97.16 | 97.17 | 97.16 | 97.96 | 97.16 | 94.32 | 94.33 | 99.28 | 2.84 | 2.84 | 16.85 | ||
| Komet | 97.40 | 97.43 | 97.40 | 98.59 | 97.40 | 94.81 | 94.83 | 99.39 | 2.60 | 2.60 | 16.11 | ||
| 20 | DTC | 95.26 | 95.26 | 95.26 | 95.15 | 95.26 | 90.52 | 90.52 | 95.50 | 4.74 | 4.74 | 21.77 | |
| MLP | 95.98 | 96.00 | 95.98 | 97.04 | 95.98 | 91.96 | 91.98 | 98.79 | 4.02 | 4.02 | 20.06 | ||
| RFC | 96.56 | 96.62 | 96.56 | 98.39 | 96.56 | 93.12 | 93.18 | 99.20 | 3.44 | 3.44 | 18.55 | ||
| FCNN | 96.08 | 96.15 | 96.08 | 98.00 | 96.08 | 92.16 | 92.22 | 99.02 | 3.92 | 3.92 | 19.81 | ||
| MHA-FCNN | 96.10 | 96.21 | 96.10 | 98.47 | 96.10 | 92.21 | 92.31 | 98.92 | 3.90 | 3.90 | 19.74 | ||
| DeepLPI | 96.27 | 96.31 | 96.27 | 97.82 | 96.27 | 92.54 | 92.58 | 99.12 | 3.73 | 3.73 | 19.32 | ||
| BarlowDTI | 96.33 | 96.37 | 96.33 | 97.82 | 96.33 | 92.66 | 92.70 | 99.08 | 3.67 | 3.67 | 19.16 | ||
| Komet | 96.28 | 96.35 | 96.28 | 98.14 | 96.28 | 92.56 | 92.63 | 99.07 | 3.72 | 3.72 | 19.28 | ||
| 30 | DTC | 94.92 | 94.92 | 94.92 | 95.04 | 94.92 | 89.85 | 89.85 | 95.18 | 5.08 | 5.08 | 22.53 | |
| MLP | 95.92 | 95.95 | 95.92 | 97.18 | 95.92 | 91.84 | 91.87 | 98.77 | 4.08 | 4.08 | 20.20 | ||
| RFC | 96.27 | 96.33 | 96.27 | 98.18 | 96.27 | 92.53 | 92.60 | 99.12 | 3.73 | 3.73 | 19.32 | ||
| FCNN | 95.55 | 95.69 | 95.55 | 98.34 | 95.54 | 91.09 | 91.23 | 98.88 | 4.45 | 4.45 | 21.10 | ||
| MHA-FCNN | 95.60 | 95.66 | 95.60 | 97.39 | 95.60 | 91.20 | 91.25 | 98.79 | 4.40 | 4.40 | 20.98 | ||
| DeepLPI | 96.00 | 96.08 | 96.00 | 98.13 | 96.00 | 92.00 | 92.08 | 98.95 | 4.00 | 4.00 | 20.00 | ||
| BarlowDTI | 95.93 | 95.95 | 95.93 | 96.99 | 95.93 | 91.87 | 91.89 | 98.84 | 4.07 | 4.07 | 20.16 | ||
| Komet | 95.97 | 96.06 | 95.97 | 98.22 | 95.97 | 91.94 | 92.04 | 98.92 | 4.03 | 4.03 | 20.06 |
On the BindingDB-Ki dataset (Table 5), which focuses on inhibition constant (Ki) data, similar improvements were observed when applying the GAN balancing technique. In the WOB experiment, the RFC model recorded an accuracy of 88.72%, precision of 88.73%, sensitivity of 88.72%, and specificity of 89.67%. In contrast, after applying GAN-based balancing, the RFC model achieved an accuracy of 92.77%, precision of 92.94%, sensitivity of 92.77%, and specificity of 97.89%. The F1-score improved to 92.93%, and the ROC-AUC increased to 93.86%, highlighting the effectiveness of the GAN+RFC hybrid model. These results confirm that the RFC model, when combined with the GAN balancing technique, outperforms other models and provides superior performance in predicting drug-target interactions in the BindingDB-Ki dataset. The p-value of 0.0916 indicated that the differences in performance between the models were not statistically significant, but RFC still emerged as the top performer across the metrics. These findings underscore the reliability and accuracy of RFC in predicting drug-target interactions, both for BindingDB-Kd and BindingDB-Ki datasets, establishing it as the proposed model for DTI prediction in this study. The performance of several machine learning (ML) and deep learning (DL) models was evaluated on the BindingDB-Ki dataset across three different threshold levels: 10, 20, and 30. The analysis encompassed key metrics such as Accuracy, Precision, Sensitivity, Specificity, F1-score, Kappa, MCC, ROC-AUC, MAE, MSE, and RMSE. At a threshold of 10, the Random Forest Classifier (RFC) outperformed all other models, achieving the highest accuracy of 91.69%, precision of 91.74%, and sensitivity of 91.69%. It also yielded the highest Kappa (83.39), MCC (83.44), and ROC-AUC (97.32) scores, while maintaining the lowest MAE, MSE, and RMSE values (8.31, 8.31, and 28.82, respectively). This demonstrates its robustness in both classification performance and error minimization. Deep learning models such as DeepLPI, Komet, and BarlowDTI also performed competitively, with Komet achieving a strong ROC-AUC of 96.52% and low RMSE of 30.66. When the threshold was increased to 20, RFC once again demonstrated superior performance, achieving 89.78% accuracy, 89.81% precision, and 96.28% ROC-AUC-outperforming all other models. Meanwhile, DeepLPI experienced a significant performance drop under this threshold, particularly with a low sensitivity of 8.94% and ROC-AUC of 34.43, indicating its sensitivity to threshold tuning. Komet, however, maintained robust metrics with an accuracy of 88.63% and ROC-AUC of 95.69%, further confirming its consistency. At the 30 threshold level, RFC continued to lead with an accuracy of 88.62%, precision of 88.64%, and ROC-AUC of 95.49%. It again registered the lowest error metrics, with MAE, MSE, and RMSE of 11.38, 11.38, and 33.73, respectively. Among DL models, DeepLPI and BarlowDTI showed improved performance compared to the 20-threshold scenario, with DeepLPI achieving 86.96% accuracy and BarlowDTI slightly ahead at 87.09%. Across all thresholds, RFC consistently demonstrated strong generalizability and resilience, making it the top-performing model for the BindingDB-Ki dataset. Komet and DeepLPI exhibited competitive performance, especially at lower thresholds, while models like DTC and FCNN, although reasonably accurate, lagged in terms of precision, AUC, and error minimization. Notably, the inclusion of attention mechanisms in models such as MHA-FCNN did not significantly enhance performance in this context.
Table 5.
Performance analysis of ML and DL models on BindingDB-Ki.
| Dataset | Threshold | Model | Accuracy | Precision | Sensitivity | Specificity | F1score | Kappa | MCC | ROC-AUC | MAE | MSE | RMSE |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BindingDB_Ki | 10 | DTC | 89.68 | 89.68 | 89.68 | 89.92 | 89.68 | 79.36 | 79.36 | 90.70 | 10.32 | 10.32 | 32.13 |
| MLP | 89.61 | 89.66 | 89.61 | 91.35 | 89.60 | 79.22 | 79.27 | 96.14 | 10.39 | 10.39 | 32.24 | ||
| RFC | 91.69 | 91.74 | 91.69 | 93.40 | 91.69 | 83.39 | 83.44 | 97.32 | 8.31 | 8.31 | 28.82 | ||
| FCNN | 89.09 | 89.74 | 89.09 | 95.42 | 89.05 | 78.19 | 78.83 | 95.63 | 10.91 | 10.91 | 33.03 | ||
| MHA-FCNN | 88.88 | 89.54 | 88.88 | 95.29 | 88.83 | 77.77 | 78.42 | 95.45 | 11.12 | 11.12 | 33.35 | ||
| DeepLPI | 90.52 | 90.58 | 90.52 | 92.28 | 90.52 | 81.05 | 81.10 | 96.63 | 9.48 | 9.48 | 30.79 | ||
| BarlowDTI | 90.27 | 90.27 | 90.27 | 90.40 | 90.27 | 80.54 | 80.54 | 96.69 | 9.73 | 9.73 | 31.20 | ||
| Komet | 90.60 | 90.96 | 90.60 | 95.21 | 90.58 | 81.21 | 81.56 | 96.52 | 9.40 | 9.40 | 30.66 | ||
| 20 | DTC | 87.24 | 87.24 | 87.24 | 87.69 | 87.24 | 74.48 | 74.49 | 88.43 | 12.76 | 12.76 | 35.72 | |
| MLP | 86.93 | 86.97 | 86.93 | 88.53 | 86.93 | 73.86 | 73.90 | 94.51 | 13.07 | 13.07 | 36.15 | ||
| RFC | 89.78 | 89.81 | 89.78 | 91.30 | 89.77 | 79.55 | 79.59 | 96.28 | 10.22 | 10.22 | 31.97 | ||
| FCNN | 86.69 | 87.26 | 86.69 | 92.87 | 86.63 | 73.37 | 73.94 | 94.32 | 13.31 | 13.31 | 36.49 | ||
| MHA-FCNN | 86.67 | 86.96 | 86.67 | 91.08 | 86.64 | 73.34 | 73.63 | 94.05 | 13.33 | 13.33 | 36.51 | ||
| DeepLPI | 53.56 | 67.21 | 53.56 | 8.94 | 42.03 | 7.06 | 15.59 | 34.43 | 46.44 | 46.44 | 68.15 | ||
| BarlowDTI | 88.20 | 88.20 | 88.20 | 88.33 | 88.20 | 76.39 | 76.39 | 95.38 | 11.80 | 11.80 | 34.36 | ||
| Komet | 88.63 | 88.67 | 88.63 | 90.24 | 88.63 | 77.26 | 77.30 | 95.69 | 11.37 | 11.37 | 33.72 | ||
| 30 | DTC | 85.91 | 85.92 | 85.91 | 86.26 | 85.91 | 71.83 | 71.83 | 87.23 | 14.09 | 14.09 | 37.53 | |
| MLP | 85.26 | 85.30 | 85.26 | 86.86 | 85.25 | 70.52 | 70.56 | 93.29 | 14.74 | 14.74 | 38.40 | ||
| RFC | 88.62 | 88.64 | 88.62 | 89.68 | 88.62 | 77.24 | 77.26 | 95.49 | 11.38 | 11.38 | 33.73 | ||
| FCNN | 84.35 | 84.88 | 84.35 | 90.43 | 84.30 | 68.72 | 69.24 | 92.57 | 15.65 | 15.65 | 39.56 | ||
| MHA-FCNN | 84.31 | 84.57 | 84.31 | 88.50 | 84.28 | 68.63 | 68.88 | 92.24 | 15.69 | 15.69 | 39.61 | ||
| DeepLPI | 86.96 | 86.96 | 86.96 | 86.64 | 86.96 | 73.92 | 73.92 | 94.63 | 13.04 | 13.04 | 36.11 | ||
| BarlowDTI | 87.09 | 87.09 | 87.09 | 86.81 | 87.09 | 74.18 | 74.18 | 94.54 | 12.91 | 12.91 | 35.93 | ||
| Komet | 87.07 | 87.09 | 87.07 | 87.96 | 87.07 | 74.15 | 74.16 | 94.61 | 12.93 | 12.93 | 35.95 |
The performance of various Machine Learning (ML) and Deep Learning (DL) models was evaluated on the BindingDB-IC50 dataset using three different activity thresholds: 10, 20, and 30. Each model was assessed using multiple metrics, including Accuracy, Precision, Sensitivity, Specificity, F1-score, Cohen’s Kappa, Matthews Correlation Coefficient (MCC), ROC-AUC, Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE) (Table 6). Across all thresholds, the Random Forest Classifier (RFC) consistently demonstrated the highest overall performance. At the 10
threshold, RFC achieved the best results, including an accuracy of 95.40%, precision of 95.41%, F1-score of 95.39%, and the highest ROC-AUC of 98.97%, while also recording the lowest error metrics (MAE: 4.60, MSE: 4.60, RMSE: 21.46). These results highlight its robust capability in binary classification tasks related to compound activity. In contrast, other models such as MLP, DeepLPI, and BarlowDTI also demonstrated strong performance, particularly in terms of ROC-AUC and specificity, but generally lagged slightly behind RFC in terms of accuracy and consistency across metrics. For instance, BarlowDTI and Komet achieved accuracies of 94.59% and 94.49% respectively, while maintaining competitive AUCs above 98.00%. At the 20
threshold, a similar trend was observed with RFC again leading in accuracy (93.75%) and other metrics such as F1-score (93.75), MCC (87.52), and ROC-AUC (98.37). Although most models showed a marginal decrease in performance compared to the 10
threshold, RFC retained a noticeable edge. When the threshold was raised to 30
, overall model performances declined slightly, as expected due to reduced class separability. Nevertheless, RFC continued to outperform the others with an accuracy of 92.90%, a F1-score of 92.89%, and a ROC-AUC of 98.01%, while also maintaining lower error rates (MAE: 7.10, RMSE: 26.65). Other models, including DeepLPI and Komet, still performed relatively well, though with a larger performance gap from RFC. Overall, these results establish RFC as the most effective model for IC50 classification on the BindingDB dataset across various thresholds, indicating its strong generalization ability and suitability for compound activity prediction tasks. Moreover, the marginally superior performances of DL-based models like FCNN, MHA-FCNN, and DeepLPI reinforce the potential of deep learning, especially when paired with interpretability enhancements in future work.
Table 6.
Performance analysis of ML and DL models on BindingDB-IC50.
| Dataset | Threshold | Model | Accuracy | Precision | Sensitivity | Specificity | F1score | Kappa | MCC | ROC-AUC | MAE | MSE | RMSE |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BindingDB_IC50 | 10 | DTC | 94.12 | 94.12 | 94.12 | 94.04 | 94.12 | 88.23 | 88.23 | 94.77 | 5.88 | 5.88 | 24.26 |
| MLP | 94.27 | 94.39 | 94.27 | 96.92 | 94.26 | 88.53 | 88.66 | 98.36 | 5.73 | 5.73 | 23.94 | ||
| RFC | 95.40 | 95.41 | 95.40 | 96.42 | 95.39 | 90.79 | 90.81 | 98.97 | 4.60 | 4.60 | 21.46 | ||
| FCNN | 93.92 | 94.37 | 93.92 | 98.93 | 93.91 | 87.85 | 88.29 | 98.13 | 6.08 | 6.08 | 24.65 | ||
| MHA-FCNN | 94.16 | 94.45 | 94.16 | 98.20 | 94.15 | 88.32 | 88.61 | 98.05 | 5.84 | 5.84 | 24.16 | ||
| DeepLPI | 94.02 | 94.08 | 94.02 | 95.95 | 94.02 | 88.04 | 88.10 | 97.86 | 5.98 | 5.98 | 24.46 | ||
| BarlowDTI | 94.59 | 94.70 | 94.59 | 97.03 | 94.59 | 89.18 | 89.29 | 98.23 | 5.41 | 5.41 | 23.26 | ||
| Komet | 94.49 | 94.76 | 94.49 | 98.41 | 94.48 | 88.97 | 89.25 | 98.10 | 5.51 | 5.51 | 23.48 | ||
| 20 | DTC | 92.04 | 92.04 | 92.04 | 92.11 | 92.04 | 84.09 | 84.09 | 92.91 | 7.96 | 7.96 | 28.21 | |
| MLP | 91.97 | 92.16 | 91.97 | 95.33 | 91.96 | 83.93 | 84.12 | 97.36 | 8.03 | 8.03 | 28.35 | ||
| RFC | 93.75 | 93.78 | 93.75 | 95.02 | 93.75 | 87.50 | 87.52 | 98.37 | 6.25 | 6.25 | 25.00 | ||
| FCNN | 91.53 | 92.10 | 91.53 | 97.30 | 91.51 | 83.07 | 83.63 | 97.05 | 8.47 | 8.47 | 29.09 | ||
| MHA-FCNN | 91.70 | 91.98 | 91.70 | 95.79 | 91.68 | 83.40 | 83.68 | 96.88 | 8.30 | 8.30 | 28.81 | ||
| DeepLPI | 92.46 | 92.58 | 92.46 | 95.14 | 92.46 | 84.92 | 85.04 | 97.34 | 7.54 | 7.54 | 27.46 | ||
| BarlowDTI | 92.45 | 92.48 | 92.45 | 93.67 | 92.45 | 84.91 | 84.93 | 97.54 | 7.55 | 7.55 | 27.47 | ||
| Komet | 92.29 | 92.72 | 92.29 | 97.32 | 92.27 | 84.58 | 85.01 | 97.34 | 7.71 | 7.71 | 27.77 | ||
| 30 | DTC | 90.83 | 90.83 | 90.83 | 90.83 | 90.83 | 81.66 | 81.66 | 91.77 | 9.17 | 9.17 | 30.28 | |
| MLP | 90.42 | 90.61 | 90.42 | 93.87 | 90.41 | 80.85 | 81.04 | 96.61 | 9.58 | 9.58 | 30.95 | ||
| RFC | 92.90 | 92.92 | 92.90 | 94.13 | 92.89 | 85.79 | 85.82 | 98.01 | 7.10 | 7.10 | 26.65 | ||
| FCNN | 90.00 | 90.69 | 90.00 | 96.52 | 89.96 | 80.00 | 80.69 | 96.24 | 10.00 | 10.00 | 31.62 | ||
| MHA-FCNN | 90.17 | 90.64 | 90.17 | 95.58 | 90.14 | 80.34 | 80.81 | 96.17 | 9.83 | 9.83 | 31.35 | ||
| DeepLPI | 91.13 | 91.31 | 91.13 | 94.48 | 91.11 | 82.25 | 82.43 | 96.69 | 8.87 | 8.87 | 29.79 | ||
| BarlowDTI | 90.79 | 90.85 | 90.79 | 88.99 | 90.79 | 81.59 | 81.64 | 97.25 | 9.21 | 9.21 | 30.34 | ||
| Komet | 91.32 | 91.62 | 91.32 | 95.56 | 91.31 | 82.64 | 82.94 | 96.96 | 8.68 | 8.68 | 29.46 |
The confusion matrix analysis of the proposed GAN+RFC hybrid model across the BindingDB-Kd, BindingDB-Ki, and BindingDB-IC50 datasets reveals its strong capability in accurately predicting drug-target interactions, particularly under conditions of class imbalance. On the BindingDB-Kd dataset (Fig. 5), the GAN+RFC model exhibits excellent performance, with a high number of true positives and true negatives for both the “Yes” and “No” classes. The model effectively minimizes false positives and false negatives, especially for the “Yes” class, which denotes the presence of drug-target interactions. This outcome underscores the effectiveness of the GAN-based data augmentation in mitigating class imbalance and enhancing the classifier’s sensitivity to minority class patterns. The hybrid framework not only improves the model’s precision and recall but also enhances its ability to generalize across different interaction types. Similarly, on the BindingDB-Ki dataset (Fig. 6), the confusion matrix further validates the robust predictive power of the GAN+RFC model. The classifier achieves a high proportion of correct predictions for both classes, with significantly reduced misclassification rates. The “Yes” class again benefits from the data balancing mechanism provided by the GAN, resulting in an increased number of correctly predicted positive interactions. This highlights the model’s ability to learn meaningful features that distinguish between interacting and non-interacting drug-target pairs, even in imbalanced data scenarios. On the BindingDB-IC50 dataset (Fig. 7), the confusion matrix demonstrates consistent performance, with the GAN+RFC model maintaining high accuracy across both classes. Despite the complexity and variability in this dataset, the model achieves reliable results, suggesting its adaptability and robustness in different drug interaction measurement conditions (Kd, Ki, and IC50). In summary, the confusion matrix analysis across all three datasets affirms the effectiveness of the hybrid GAN+RFC model. Its ability to handle imbalanced data while maintaining high classification accuracy makes it a powerful and generalizable tool for drug-target interaction prediction in computational drug discovery pipelines.
Fig. 5.
Confusion matrix of ML and DL models on BindingDB-Kd Data Using GAN.
Fig. 6.
Confusion matrix of ML and DL models on BindingDB-Ki Data Using GAN.
Fig. 7.
Confusion matrix of ML and DL models on BindingDB-IC50 Data Using GAN.
The classification report analysis across the BindingDB-Kd, BindingDB-Ki, and BindingDB-IC50 datasets underscores the superior performance of the proposed hybrid GAN+RFC model when compared to traditional machine learning and deep learning approaches (Table 7). The application of GAN for data balancing, coupled with the robust classification capabilities of Random Forest, resulted in consistently high precision, recall, and F1-scores-particularly for the “Yes” class, which is often underrepresented in imbalanced datasets. On the BindingDB-Kd dataset, the GAN+RFC model achieved the highest F1-score of 97.42% for the “Yes” class, with a precision of 98.78% and recall of 96.09%. These results indicate the model’s strong ability to accurately identify true drug-target interactions while minimizing false positives and negatives. In contrast, while models like FCNN and MHA-FCNN also performed well, their slightly lower recall values suggest a relatively higher tendency to misclassify positive samples compared to the GAN+RFC model. A similar trend is observed on the BindingDB-Ki dataset. Here, the GAN+RFC model again outperformed its counterparts, achieving a precision of 93.22%, recall of 90.00%, and F1-score of 91.58% for the “Yes” class. This consistent performance highlights the model’s generalizability across datasets with different bioactivity measures. Although deep learning models such as DeepLPI and BarlowDTI showed competitive results, especially in terms of precision, their recall values were comparatively lower, which could lead to missed positive interactions-a critical concern in drug discovery. In the case of the BindingDB-IC50 dataset, the GAN+RFC model maintained its dominance, delivering an F1-score of 95.34% for the “Yes” class. Notably, this performance surpasses models like FCNN and MHA-FCNN, which achieved slightly lower recall values despite high precision. The results reaffirm the effectiveness of combining generative data augmentation with a powerful ensemble classifier, ensuring not only high accuracy but also robustness in imbalanced classification settings. In conclusion, the classification report analysis demonstrates that the hybrid GAN+RFC model consistently achieves superior performance across all three datasets, with particular strength in identifying positive drug-target interactions. By effectively addressing class imbalance and maintaining high predictive accuracy, this hybrid approach provides a reliable and scalable solution for drug-target interaction prediction, positioning itself as a strong candidate for deployment in real-world drug discovery pipelines.
Table 7.
Classification report analysis of ML and DL model on DTI datasets.
| BindingDB-Kd | BindingDB-Ki | BindingDB-IC50 | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ML/DL | Class | Precision | Recall | F1-Score | ML/DL | Class | Precision | Recall | F1-Score | ML/DL | Class | Precision | Recall | F1-Score |
| DTC | No | 96.49 | 96.45 | 96.47 | DTC | No | 89.40 | 89.92 | 89.66 | DTC | No | 94.20 | 94.04 | 94.12 |
| DTC | Yes | 96.44 | 96.49 | 96.46 | DTC | Yes | 89.95 | 89.43 | 89.69 | DTC | Yes | 94.03 | 94.20 | 94.11 |
| MLP | No | 96.41 | 97.91 | 97.15 | MLP | No | 88.20 | 91.35 | 89.74 | MLP | No | 92.05 | 96.92 | 94.42 |
| MLP | Yes | 97.87 | 96.35 | 97.10 | MLP | Yes | 91.11 | 87.88 | 89.47 | MLP | Yes | 96.74 | 91.60 | 94.10 |
| RFC | No | 96.20 | 98.82 | 97.49 | RFC | No | 90.25 | 93.40 | 91.80 | RFC | No | 94.50 | 96.42 | 95.45 |
| RFC | Yes | 98.78 | 96.09 | 97.42 | RFC | Yes | 93.22 | 90.00 | 91.58 | RFC | Yes | 96.34 | 94.37 | 95.34 |
| FCNN | No | 95.59 | 98.84 | 97.19 | FCNN | No | 84.62 | 95.42 | 89.70 | FCNN | No | 89.94 | 98.93 | 94.22 |
| FCNN | Yes | 98.80 | 95.43 | 97.09 | FCNN | Yes | 94.81 | 82.81 | 88.41 | FCNN | Yes | 98.81 | 88.90 | 93.59 |
| MHA-FCNN | No | 95.85 | 98.70 | 97.26 | MHA-FCNN | No | 84.38 | 95.29 | 89.51 | MHA-FCNN | No | 90.88 | 98.20 | 94.40 |
| MHA-FCNN | Yes | 98.66 | 95.72 | 97.17 | MHA-FCNN | Yes | 94.65 | 82.52 | 88.17 | MHA-FCNN | Yes | 98.03 | 90.12 | 93.91 |
| DeepLPI | No | 96.24 | 97.88 | 97.05 | DeepLPI | No | 89.07 | 92.28 | 90.65 | DeepLPI | No | 92.40 | 95.95 | 94.14 |
| DeepLPI | Yes | 97.84 | 96.17 | 97.00 | DeepLPI | Yes | 92.07 | 88.78 | 90.39 | DeepLPI | Yes | 95.78 | 92.08 | 93.89 |
| BarlowDTI | No | 96.42 | 97.96 | 97.19 | BarlowDTI | No | 90.09 | 90.40 | 90.24 | BarlowDTI | No | 92.54 | 97.03 | 94.73 |
| BarlowDTI | Yes | 97.92 | 96.36 | 97.13 | BarlowDTI | Yes | 90.45 | 90.14 | 90.29 | BarlowDTI | Yes | 96.86 | 92.15 | 94.45 |
| Komet | No | 96.31 | 98.59 | 97.44 | Komet | No | 87.11 | 95.21 | 90.98 | Komet | No | 91.26 | 98.41 | 94.70 |
| Komet | Yes | 98.55 | 96.22 | 97.37 | Komet | Yes | 94.77 | 86.04 | 90.19 | Komet | Yes | 98.27 | 90.55 | 94.25 |
Statistical analysis
The performance of the models across the updated datasets was further assessed using the Friedman test, a non-parametric statistical test designed to identify differences in performance among multiple models when the assumptions of parametric tests-such as normality-are not met. The test was applied to the BindingDB-Kd, BindingDB-Ki, and BindingDB-IC50 datasets. For the BindingDB-Kd dataset, the Friedman test statistic was found to be 7.0000, with a p-value of 0.4289. Since the p-value is considerably greater than the standard significance threshold of 0.05, we conclude that there are no statistically significant differences in performance among the models. This indicates that all models: RFC, MLP, DTC, FCNN, MHA-FCNN, DeepLPI, BarlowDTI and Komet, exhibited comparable performance despite variations in individual metric scores. A similar trend was observed for both the BindingDB-Ki and BindingDB-IC50 datasets, where the Friedman test also returned a statistic of 7.0000 and a p-value of 0.4289 for each. These consistent results across all three datasets confirm that the observed differences in model metrics are not statistically significant, and the models perform similarly in predicting outcomes on these datasets.
Analysis on varying degrees of imbalanced datasets
The performance analysis on datasets with varying degrees of imbalance demonstrates the effectiveness of balancing techniques, particularly GAN-based augmentation, in improving the classification results across different levels of class imbalance (Table 8). The analysis focuses on three datasets: BindingDB-Kd, BindingDB-Ki, and BindingDB-IC50, evaluating the impact of extreme, moderate, and balanced class distributions. For the BindingDB-Kd dataset, the extreme imbalance case showed a noticeable drop in performance, with an accuracy of 93.69% and a relatively low F1-score of 93.27%. When the dataset became moderately imbalanced, performance improved slightly with an accuracy of 94.64% and an F1-score of 94.12%. However, the most significant improvement occurred when the dataset was balanced using GAN augmentation, resulting in an impressive accuracy of 97.46%, precision of 97.49%, sensitivity of 97.46%, and specificity of 98.82%. The F1-score of 97.46% further underscores the effectiveness of GAN in handling class imbalance, leading to a well-rounded improvement in model performance. Similarly, for the BindingDB-Ki dataset, the extreme imbalance led to lower performance, with accuracy at 84.87% and an F1-score of 84.81%. With moderate balancing, accuracy rose to 87.22%, and the F1-score increased to 86.97%. The balanced case (using 10+GAN) resulted in a notable enhancement, reaching an accuracy of 91.69%, precision of 91.74%, sensitivity of 91.69%, and specificity of 93.40%, with an F1-score of 91.69%. This demonstrates that GAN augmentation can significantly enhance model performance, especially for datasets with initially high-class imbalance. For the BindingDB-IC50 dataset, extreme class imbalance led to an accuracy of 88.76% and F1-score of 88.57%, which are comparatively lower than the other two datasets. After applying moderate balancing, the model performance improved, achieving an accuracy of 91.99% and an F1-score of 91.71%. The balanced case using 10+GAN provided the most significant boost, achieving an accuracy of 95.40%, precision of 95.41%, sensitivity of 95.40%, and specificity of 96.42%, with an F1-score of 95.39%. This improvement across all performance metrics highlights the substantial benefits of GAN augmentation, particularly in addressing the challenges of extreme class imbalance. The analysis reveals the importance of addressing class imbalance for achieving optimal performance. For all three datasets, the GAN-based balancing technique significantly improved the model’s ability to correctly classify both classes, particularly the underrepresented “Yes” class. By balancing the datasets, the 10+GAN approach consistently delivered superior performance across accuracy, precision, sensitivity, specificity, and F1-score, making it a highly effective solution for imbalanced drug-target interaction prediction tasks.
Table 8.
Performance analysis of varying degrees of Imbalanced Datasets.
| Dataset | Balancing | Class 0 / No | Class 1 / Yes | Accuracy | Precision | Sensitivity | Specificity | F1score | Threshold |
|---|---|---|---|---|---|---|---|---|---|
| BindingDB_Kd | Extreme Imbalanced | 37385 | 4851 | 93.69 | 93.24 | 93.69 | 98.08 | 93.27 | 30 |
| Moderate Imbalanced | 38910 | 3326 | 94.64 | 94.1 | 94.64 | 98.63 | 94.12 | 10 | |
| Balanced | 38910 | 38910 | 97.46 | 97.49 | 97.46 | 98.82 | 97.46 | 10+GAN | |
| BindingDB_Ki | Extreme Imbalanced | 197572 | 99113 | 84.87 | 84.77 | 84.87 | 89.45 | 84.81 | 30 |
| Moderate Imbalanced | 227658 | 69027 | 87.22 | 86.86 | 87.22 | 93.2 | 86.97 | 10 | |
| Balanced | 227658 | 227658 | 91.69 | 91.74 | 91.69 | 93.40 | 91.69 | 10+GAN | |
| BindingDB_IC50 | Extreme Imbalanced | 597037 | 169867 | 88.76 | 88.47 | 88.76 | 94.01 | 88.57 | 30 |
| Moderate Imbalanced | 664750 | 102154 | 91.99 | 91.57 | 91.99 | 96.54 | 91.71 | 10 | |
| Balanced | 664750 | 664750 | 95.40 | 95.41 | 95.40 | 96.42 | 95.39 | 10+GAN |
Effect of data balancing
The inclusion of data balancing, achieved through the GAN module, significantly enhances the performance of the models across all benchmark datasets-BindingDB-Kd, BindingDB-Ki, and BindingDB-IC50 (Table 9). This improvement is evident when comparing models trained with unbalanced datasets to those utilizing the GAN module for data balancing. Specifically, the proposed GAN+RFC framework consistently outperforms other configurations in accuracy, sensitivity, specificity, and AUC-ROC, while also reducing error metrics such as MAE, MSE, and RMSE. On the BindingDB-Kd dataset, for instance, GAN+RFC achieves an accuracy of 97.46%, a sensitivity of 97.46%, and an AUC-ROC of 99.42, as opposed to an accuracy of 94.64% and an AUC-ROC of 93.17 for the unbalanced RFC model. Similarly, across BindingDB-Ki and BindingDB-IC50 datasets, the GAN-enhanced models demonstrate marked improvements in predictive metrics, underscoring the critical role of data balancing in mitigating biases and improving generalizability. These results highlight the transformative impact of the GAN module in effectively capturing and enhancing patterns in imbalanced datasets, leading to robust and reliable predictive performance.
Table 9.
Effect of data balancing.
| Dataset | Data Balancing | Model | Accuracy | Precision | Sensitivity | Specificity | F1score | Kappa | MCC | ROC-AUC | MAE | MSE | RMSE |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BindingDB_Kd | No | DTC | 92.84 | 92.70 | 92.84 | 96.30 | 92.76 | 51.91 | 51.93 | 76.97 | 7.16 | 7.16 | 26.76 |
| MLP | 94.15 | 93.73 | 94.15 | 97.63 | 93.89 | 58.10 | 58.43 | 88.84 | 5.85 | 5.85 | 24.18 | ||
| RFC | 94.64 | 94.10 | 94.64 | 98.63 | 94.12 | 58.17 | 59.67 | 93.17 | 5.36 | 5.36 | 23.16 | ||
| GAN | DTC | 96.47 | 96.47 | 96.47 | 96.45 | 96.47 | 92.93 | 92.93 | 96.66 | 3.53 | 3.53 | 18.80 | |
| MLP | 97.13 | 97.14 | 97.13 | 97.91 | 97.13 | 94.26 | 94.27 | 99.15 | 2.87 | 2.87 | 16.95 | ||
| RFC | 97.46 | 97.49 | 97.46 | 98.82 | 97.46 | 94.91 | 94.95 | 99.42 | 2.54 | 2.54 | 15.95 | ||
| BindingDB_Ki | No | DTC | 84.26 | 84.24 | 84.26 | 89.81 | 84.25 | 56.01 | 56.01 | 80.51 | 15.74 | 15.74 | 39.67 |
| MLP | 84.45 | 83.73 | 84.45 | 92.53 | 83.90 | 53.79 | 54.19 | 87.68 | 15.55 | 15.55 | 39.44 | ||
| RFC | 87.22 | 86.86 | 87.22 | 93.20 | 86.97 | 63.03 | 63.18 | 91.49 | 12.78 | 12.78 | 35.75 | ||
| GAN | DTC | 89.68 | 89.68 | 89.68 | 89.92 | 89.68 | 79.36 | 79.36 | 90.70 | 10.32 | 10.32 | 32.13 | |
| MLP | 89.61 | 89.66 | 89.61 | 91.35 | 89.60 | 79.22 | 79.27 | 96.14 | 10.39 | 10.39 | 32.24 | ||
| RFC | 91.69 | 91.74 | 91.69 | 93.40 | 91.69 | 83.39 | 83.44 | 97.32 | 8.31 | 8.31 | 28.82 | ||
| BindingDB_IC50 | No | DTC | 89.74 | 89.71 | 89.74 | 94.14 | 89.73 | 55.52 | 55.52 | 80.79 | 10.26 | 10.26 | 32.03 |
| MLP | 89.94 | 89.18 | 89.94 | 95.78 | 89.44 | 52.33 | 52.79 | 89.34 | 10.06 | 10.06 | 31.72 | ||
| RFC | 91.99 | 91.57 | 91.99 | 96.54 | 91.71 | 63.03 | 63.30 | 93.41 | 8.01 | 8.01 | 28.30 | ||
| GAN | DTC | 94.12 | 94.12 | 94.12 | 94.04 | 94.12 | 88.23 | 88.23 | 94.77 | 5.88 | 5.88 | 24.26 | |
| MLP | 94.27 | 94.39 | 94.27 | 96.92 | 94.26 | 88.53 | 88.66 | 98.36 | 5.73 | 5.73 | 23.94 | ||
| RFC | 95.40 | 95.41 | 95.40 | 96.42 | 95.39 | 90.79 | 90.81 | 98.97 | 4.60 | 4.60 | 21.46 |
Comparative study of ACC and DC
The performance comparison between Amino Acid Composition (ACC) and Dipeptide Composition (DC) using various machine learning (ML) and deep learning (DL) models is essential for understanding how different feature extraction methods impact the results in Drug-Target Interaction (DTI) datasets (Table 10). This study evaluates models on three datasets: BindingDB-Kd, BindingDB-Ki, and BindingDB-IC50, with key performance metrics such as Accuracy, Precision, Sensitivity, Specificity, F1-Score, Kappa, MCC, ROC-AUC, MAE, MSE, and RMSE. For the BindingDB-Kd dataset, the Amino Acid Composition (ACC) generally outperforms the Dipeptide Composition (DC). In the case of ACC, the Random Forest Classifier (RFC) achieved the highest performance with an accuracy of 97.46%, precision of 97.49%, sensitivity of 97.46%, specificity of 98.82%, and an F1-score of 97.46%. On the other hand, for DC, although the RFC model delivered a slightly lower accuracy of 97.09%, precision, and sensitivity remained strong, but the overall performance lagged behind ACC in terms of key metrics like F1-score and ROC-AUC. The BindingDB-Ki dataset shows a similar trend, with ACC compositions exhibiting superior performance over DC. For ACC, the RFC model achieved an accuracy of 91.69%, which was notably higher than the DC version of the same model, which had an accuracy of 91.61%. The RFC in the ACC composition also outperformed in terms of precision, sensitivity, and F1-score, with an F1-score reaching 91.69%, compared to 91.61% for DC. The BindingDB-IC50 dataset also supports the trend observed in the other datasets, with ACC compositions outperforming DC. The RFC model again showed the best performance for ACC, achieving an accuracy of 95.40%, precision of 95.41%, sensitivity of 95.40%, and specificity of 96.42%. For DC, the RFC model exhibited slightly lower values across the metrics, with an accuracy of 95.37% and comparable reductions in precision and sensitivity. Across all datasets and models tested, the Amino Acid Composition (ACC) consistently led to better performance in terms of accuracy, precision, sensitivity, and F1-score compared to Dipeptide Composition (DC). This comparative study suggests that Amino Acid Composition (ACC) is more effective for Drug-Target Interaction tasks when compared to Dipeptide Composition (DC), as it provides a more comprehensive measure of model performance across multiple metrics.
Table 10.
Comparison analysis of ACC and DC using ML/DL on DTI datasets.
| Dataset | Composition | Model | Accuracy | Precision | Sensitivity | Specificity | F1score | Kappa | MCC | ROC-AUC | MAE | MSE | RMSE |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BindingDB_Kd | ACC | DTC | 96.47 | 96.47 | 96.47 | 96.45 | 96.47 | 92.93 | 92.93 | 96.66 | 3.53 | 3.53 | 18.80 |
| MLP | 97.13 | 97.14 | 97.13 | 97.91 | 97.13 | 94.26 | 94.27 | 99.15 | 2.87 | 2.87 | 16.95 | ||
| RFC | 97.46 | 97.49 | 97.46 | 98.82 | 97.46 | 94.91 | 94.95 | 99.42 | 2.54 | 2.54 | 15.95 | ||
| DC | DTC | 96.01 | 96.01 | 96.01 | 95.75 | 96.01 | 92.02 | 92.02 | 96.24 | 3.99 | 3.99 | 19.97 | |
| MLP | 96.95 | 96.96 | 96.95 | 97.75 | 96.95 | 93.90 | 93.91 | 99.43 | 3.05 | 3.05 | 17.47 | ||
| RFC | 97.09 | 97.12 | 97.09 | 98.22 | 97.09 | 94.18 | 94.21 | 99.31 | 2.91 | 2.91 | 17.06 | ||
| BindingDB_Ki | ACC | DTC | 89.68 | 89.68 | 89.68 | 89.92 | 89.68 | 79.36 | 79.36 | 90.70 | 10.32 | 10.32 | 32.13 |
| MLP | 89.61 | 89.66 | 89.61 | 91.35 | 89.60 | 79.22 | 79.27 | 96.14 | 10.39 | 10.39 | 32.24 | ||
| RFC | 91.69 | 91.74 | 91.69 | 93.40 | 91.69 | 83.39 | 83.44 | 97.32 | 8.31 | 8.31 | 28.82 | ||
| DC | DTC | 89.75 | 89.75 | 89.75 | 89.82 | 89.75 | 79.49 | 79.49 | 90.83 | 10.25 | 10.25 | 32.02 | |
| MLP | 90.71 | 90.77 | 90.71 | 92.62 | 90.7 | 81.41 | 81.47 | 97.01 | 9.29 | 9.29 | 30.49 | ||
| RFC | 91.61 | 91.64 | 91.61 | 92.77 | 91.61 | 83.22 | 83.25 | 97.39 | 8.39 | 8.39 | 28.96 | ||
| BindingDB_IC50 | ACC | DTC | 94.12 | 94.12 | 94.12 | 94.04 | 94.12 | 88.23 | 88.23 | 94.77 | 5.88 | 5.88 | 24.26 |
| MLP | 94.27 | 94.39 | 94.27 | 96.92 | 94.26 | 88.53 | 88.66 | 98.36 | 5.73 | 5.73 | 23.94 | ||
| RFC | 95.40 | 95.41 | 95.40 | 96.42 | 95.39 | 90.79 | 90.81 | 98.97 | 4.60 | 4.60 | 21.46 | ||
| DC | DTC | 94.11 | 94.11 | 94.11 | 94.1 | 94.11 | 88.22 | 88.22 | 94.8 | 5.89 | 5.89 | 24.27 | |
| MLP | 94.69 | 94.71 | 94.69 | 95.77 | 94.69 | 89.38 | 89.41 | 98.85 | 5.31 | 5.31 | 23.04 | ||
| RFC | 95.37 | 95.39 | 95.37 | 96.32 | 95.37 | 90.74 | 90.75 | 98.99 | 4.63 | 4.63 | 21.52 |
Complexity analysis
The computational efficiency of machine learning (ML) and deep learning (DL) models plays a critical role in their practical application, particularly in Drug-Target Interaction (DTI) prediction tasks. In this section, we analyze the training time, prediction time, and total execution time for different models across three DTI datasets: BindingDB-Kd, BindingDB-Ki, and BindingDB-IC50. Table 11 presents the complexity analysis results for the models evaluated in this study. For the BindingDB-Kd dataset, the Random Forest Classifier (RFC) emerges as the most efficient model, requiring 11.697 seconds for training and 0.206 seconds for prediction, resulting in a total execution time of 11.903 seconds. This is significantly faster compared to other models like DeepLPI (82.640 seconds) and BarlowDTI (130.469 seconds), which exhibit considerably longer training and prediction times. The Multi-layer Perceptron (MLP) and Fully Connected Neural Network (FCNN) models also require substantial time for training, with MLP taking 57.502 seconds and FCNN taking 56.124 seconds for training. In the case of the BindingDB-Ki dataset, RFC continues to show its efficiency, requiring 76.017 seconds for training and 2.423 seconds for prediction, yielding a total time of 78.440 seconds. This is again significantly less than other DL models such as BarlowDTI (736.850 seconds) and Komet (470.849 seconds). As with the BindingDB-Kd dataset, MLP exhibits high computational complexity, requiring 553.865 seconds for training. Models like MHA-FCNN and DeepLPI also show considerable training times, reaching around 330.942 seconds and 454.559 seconds, respectively. For the BindingDB-IC50 dataset, the computational complexity increases for all models, particularly the deep learning-based models. RFC remains the most time-efficient model with a training time of 244.949 seconds and a prediction time of 7.019 seconds, resulting in a total time of 251.967 seconds. In comparison, the BarlowDTI model exhibits the longest execution time, with a total time of 2129.139 seconds. MLP also shows high computational demand, taking 741.861 seconds for training and 0.550 seconds for prediction, with a total execution time of 742.411 seconds. From the complexity analysis, it is evident that traditional machine learning models such as DTC and RFC are significantly more efficient in terms of training and prediction time compared to deep learning models like MLP, FCNN, MHA-FCNN, and others. RFC stands out as the most computationally efficient model across all three datasets, making it a strong candidate for scenarios where rapid execution is critical. In contrast, deep learning models, while often more powerful in terms of predictive accuracy, require considerably more computational resources and time.
Table 11.
Complexity analysis of ML and DL model on DTI Datasets.
| Dataset | Model name | Training time (s) | Prediction time (s) | Total time (s) |
|---|---|---|---|---|
| BindingDB_Kd | DTC | 2.808 | 0.007 | 2.814 |
| MLP | 57.502 | 0.047 | 57.548 | |
| RFC | 11.697 | 0.206 | 11.903 | |
| FCNN | 56.124 | 1.114 | 57.239 | |
| MHA-FCNN | 60.584 | 1.482 | 62.066 | |
| DeepLPI | 81.352 | 1.288 | 82.640 | |
| BarlowDTI | 129.176 | 1.293 | 130.469 | |
| Komet | 83.035 | 1.266 | 84.301 | |
| BindingDB_Ki | DTC | 13.721 | 0.057 | 13.778 |
| MLP | 553.865 | 0.194 | 554.059 | |
| RFC | 76.017 | 2.423 | 78.440 | |
| FCNN | 309.454 | 5.399 | 314.854 | |
| MHA-FCNN | 330.942 | 5.125 | 336.067 | |
| DeepLPI | 454.559 | 5.170 | 459.729 | |
| BarlowDTI | 731.753 | 5.097 | 736.850 | |
| Komet | 465.687 | 5.162 | 470.849 | |
| BindingDB_IC50 | DTC | 52.293 | 0.163 | 52.456 |
| MLP | 741.861 | 0.550 | 742.411 | |
| RFC | 244.949 | 7.019 | 251.967 | |
| FCNN | 872.200 | 14.611 | 886.811 | |
| MHA-FCNN | 913.661 | 14.015 | 927.676 | |
| DeepLPI | 1285.881 | 13.784 | 1299.665 | |
| BarlowDTI | 2115.441 | 13.698 | 2129.139 | |
| Komet | 1336.914 | 13.869 | 1350.783 |
Discussion
We have presented an analytical comparison of the performance of our proposed GAN+RFC model, which combines advanced feature extraction (MACCSKeys and ACC) and data balancing using a GAN module and RFC model, with state-of-the-art (SOTA) approaches for drug-target interaction (DTI) prediction. The evaluation, conducted across multiple benchmark datasets and metrics, highlights the efficacy of our method as summarized in Table 12. Competing methods include DeepLPI9, BarlowDTI12, Komet13, Ada-kNN-DTA10, and MDCT-DTA11. Our GAN+RFC model consistently outperforms SOTA models across three key datasets: BindingDB-Kd, BindingDB-Ki, and BindingDB-IC50. Notably, on the BindingDB-Kd dataset, the model achieves sensitivity of 97.46%, specificity of 98.82%, and an AUC-ROC of 99.42%, surpassing other methods such as DeepLPI (AUC-ROC: 79.00) and Komet (AUC-ROC: 70.00). Despite BarlowDTI showing an improvement with an AUC-ROC of 93.64, it falls short compared to our method’s superior accuracy. Similarly, on the BindingDB-Ki dataset, our approach achieves sensitivity of 91.69%, specificity of 93.40%, and AUC-ROC of 97.32%, outperforming Ada-kNN-DTA and MDCT-DTA, which reported higher RMSE values (73.50 and 47.50, respectively) and lower overall metric alignment. For the BindingDB-IC50 dataset, our GAN+RFC model demonstrates sensitivity of 95.40%, specificity of 95.42%, and AUC-ROC of 98.97, coupled with low MSE (4.60) and RMSE (21.46), significantly exceeding benchmarks set by competing models such as Ada-kNN-DTA. The observed performance gains stem from the synergy of advanced feature extraction and GAN-enabled data balancing, capturing complex molecular and target patterns. Moreover, the Random Forest Classifier contributes robust decision-making capabilities, effectively handling high-dimensional data and forming precise predictive boundaries. These features collectively establish a versatile and scalable framework, enabling accurate identification of drug-target interactions. In conclusion, our GAN+RFC model emerges as a highly effective tool for DTI prediction, outperforming existing methods in both classification and regression tasks across diverse datasets. Its demonstrated generalizability and robustness offer promising implications for computational drug discovery, paving the way for improved therapeutic development and pharmaceutical research.
Table 12.
Comparison of drug-target interaction prediction models with state-of-arts (SOTA) works.
| SI. No. | Author | Dataset | Model | Sensitivity | Specificity | AUC-ROC | MSE | RMSE |
|---|---|---|---|---|---|---|---|---|
| 1 | Wei et al.9 | BindingDB-Kd | DeepLPI | 68.40 | 77.30 | 79.00 | – | – |
| 2 | Schuh et al.12 | BindingDB-Kd | BarlowDTI | – | – | 93.64 | – | – |
| 3 | Guichaoua et al.13 | BindingDB-Kd | Komet | – | – | 70.00 | – | – |
| 4 | Our Proposed | BindingDB-Kd | GAN+RFC | 97.46 | 98.82 | 99.42 | 2.54 | 15.95 |
| 5 | Pei et al.10 | BindingDB-Ki | Ada-kNN-DTA | – | – | – | – | 73.50 |
| 6 | Zhu et al.11 | BindingDB-Ki | MDCT-DTA | – | - | – | 47.50 | – |
| 7 | Our Proposed | BindingDB-Ki | GAN+RFC | 91.69 | 93.40 | 97.32 | 8.31 | 28.82 |
| 8 | Pei et al.10 | BindingDB-IC50 | Ada-kNN-DTA | – | – | – | – | 67.50 |
| 9 | Pei et al.14 | BindingDB-IC50 | Ada-kNN-DTA | – | – | – | – | 73.50 |
| 10 | Our Proposed | BindingDB-IC50 | GAN+RFC | 95.40 | 95.42 | 98.97 | 4.60 | 21.46 |
Problem statements validation
VP1: Integration of Chemical and Biological Information Our paper introduces a dual feature extraction method, utilizing MACCS keys for drug structural features and amino acid/dipeptide compositions for target biomolecular properties, addressing this issue.
VP2: Data Imbalance in Experimental Datasets Our research employs Generative Adversarial Networks (GANs) to generate synthetic data for the minority class, significantly reducing false negatives and improving model sensitivity.
VP3: Limitations of Traditional Drug Discovery Approaches Our study highlights the inefficiencies of traditional methods and proposes a hybrid ML framework, incorporating the Random Forest Classifier (RFC) for scalable, high-dimensional data analysis.
VP4: Threshold Optimization for Drug-Target Interaction This research systematically evaluates threshold selection, ensuring reliable classification and improved accuracy. The GAN+RFC model demonstrates high-performance metrics, including ROC-AUC scores exceeding 97%, validating its effectiveness.
These validated aspects of the problem statement emphasize the necessity for innovative computational frameworks that address integration, imbalance, and scalability.
Validation of research questions
-
VRQ1: How can the proposed ML model improve the prediction accuracy of drug-target interactions compared to traditional ML/DL models?
To address this research question, the proposed ML models, including RFC and the hybrid GAN+RFC model, were evaluated against traditional ML and DL models, such as DTC, RFC, MLP, FCNN, MHA-FCNN, DeepLPI, BarlowDTI and Komet. The comparison revealed that the GAN+RFC hybrid model outperforms traditional models, demonstrating superior prediction accuracy and robustness in DTI classification. Specifically, the integration of GAN for data augmentation effectively addresses the issue of class imbalance, leading to more accurate and reliable predictions. The GAN+RFC model consistently achieved higher accuracy, precision, and recall, surpassing the performance of DTC and MLP models, thereby validating the effectiveness of the proposed models in improving prediction accuracy for DTI (Tables 4, 5 and 6).
-
VRQ2: What are the benefits of GAN data balancing technique for DTI prediction?
The application of GAN for data balancing offers several significant benefits in DTI prediction. One of the primary challenges in DTI prediction is the class imbalance, where positive interactions (drug-target pairs) are often underrepresented compared to negative interactions. GAN-based data augmentation generates synthetic samples of underrepresented positive interactions, thereby balancing the class distribution and reducing bias in the model. This leads to more accurate predictions for positive interactions, as the model is trained on a more balanced and representative dataset (Table 9). The use of GAN has shown substantial improvements in classification performance, particularly in precision and recall for identifying drug-target interactions, confirming its value as a data balancing technique for DTI prediction.
-
VRQ3: Can the proposed method enhance predictive performance and scalability in drug discovery?
The proposed method, combining GAN-based data augmentation with the RFC model, demonstrates significant improvements in both predictive performance and scalability in drug discovery. The hybrid approach achieves higher accuracy and robustness, especially when dealing with large and complex datasets (Tables 4, 5 and 6). By leveraging GAN to augment the training data, the method can handle class imbalances and improve generalization, leading to more reliable DTI predictions. Additionally, the use of RFC, a scalable ML algorithm, ensures that the model can be effectively applied to large-scale drug discovery efforts, where high volumes of data need to be processed efficiently. The results validate that the proposed method enhances not only the predictive performance but also the scalability, making it a promising solution for accelerating the drug discovery process.
Hypothesis validation
To validate the hypothesis, extensive experiments were conducted on the BindingDB_Kd and BindingDB_Ki datasets, evaluating the performance of the proposed model against traditional ML and DL models.
Accuracy Improvement: The proposed model (GAN+RFC) achieved significantly higher accuracy (e.g., ROC-AUC scores of 99.42% for BindingDB_Kd, 97.32% for BindingDB_Ki and 98.97% for BindingDB_IC50) compared to traditional ML/DL models, demonstrating its superior capability in learning complex relationships between drug-target pairs.
Data Balancing via GANs: Using GANs to address class imbalance improved the model’s performance on minority class predictions, as evidenced by increased sensitivity and reduced false-negative rates.
Feature Representation: The integration of MACCS fingerprints and amino acid composition captured diverse and complementary information, leading to improved feature representation and classification performance.
Scalability: The hybrid model demonstrated consistent performance across datasets of varying sizes, validating its scalability and robustness.
These results confirm the hypothesis, highlighting the effectiveness of integrating advanced data balancing techniques, feature engineering, and hybrid ML approaches in predicting DTIs with high precision and reliability.
Clinical applicability for drug repositioning
The proposed GAN+RFC model demonstrates not only its robustness and accuracy in predicting drug-target interactions (DTIs) but also its potential to contribute significantly to drug repositioning efforts, a crucial area in drug discovery. Drug repositioning, the process of identifying new therapeutic uses for existing drugs, is of particular importance for accelerating drug development timelines and reducing associated costs. The following discussion highlights the clinical applicability of our model in this context, along with practical examples to underscore its relevance to healthcare and pharmacology.
Guiding preclinical studies
The model’s ability to predict DTIs with high sensitivity and specificity provides a powerful tool for identifying potential off-target effects or new therapeutic targets of existing drugs. For instance: - Example 1: Anti-viral drugs such as Remdesivir, initially developed for hepatitis C, were repositioned for treating COVID-19. Using our model, such potential off-target interactions could be identified earlier by analyzing drug-target relationships across different disease pathways. - Example 2: Statins, commonly used to lower cholesterol, have been found to exhibit anti-inflammatory properties. Our model could aid in predicting similar novel interactions, guiding preclinical studies to validate these findings.
By accurately predicting interactions, the model minimizes the experimental burden associated with screening vast chemical libraries, focusing preclinical studies on the most promising candidates.
Enhancing clinical studies
The predictions generated by the GAN+RFC model can also support clinical study design by prioritizing drug candidates with higher probabilities of success. For example: - Example 1: Drugs identified as potential repositioning candidates for rare diseases can be prioritized, addressing the unmet need for orphan disease treatments. - Example 2: Anti-cancer drugs can be repurposed to target specific mutations or pathways, allowing for personalized treatment strategies based on the predicted interactions with mutated proteins in cancer patients.
These insights could expedite the clinical validation phase, leading to faster approval processes for repurposed drugs.
In summary, the proposed GAN+RFC model offers substantial value not only in advancing DTI prediction but also in fostering translational research that bridges computational predictions with practical applications in preclinical and clinical drug discovery processes. This reinforces the significance for the healthcare and pharmacology communities, showcasing its potential to drive innovation in drug repositioning and precision medicine.
Conclusion
In this study, we have proposed a hybrid framework that effectively integrates advanced feature engineering, data balancing, and ML techniques to address the challenges in DTI prediction. The core contribution of our work is the combination of drug fingerprints (MACCS keys) and target compositions (amino acid sequences) into a unified feature representation, allowing for a more comprehensive capture of the biochemical and structural information of both drugs and targets. By employing GANs for data balancing, we successfully mitigated the issues arising from class imbalance, enhancing the model’s sensitivity and reducing false negatives. Our experimental results, evaluated on BindingDB-Kd and BindingDB-Ki datasets, demonstrate that the proposed GAN+RFC model outperforms existing SOTA methods in terms of sensitivity, specificity, and AUC-ROC. Specifically, our model achieves an AUC-ROC of 99.42% on the BindingDB-Kd dataset, 97.32% on the BindingDB-Ki dataset and 98.97% on the BindingDB-IC50, outperforming other prominent DTI prediction models in terms of these critical evaluation metrics. This approach not only enhances the prediction accuracy but also provides a robust solution for DTI prediction that can generalize well across different datasets. Furthermore, our method is computationally efficient, making it suitable for large-scale applications in drug discovery.
While the proposed framework demonstrates significant improvements in DTI prediction, it has certain limitations. It does not incorporate transformer-based DL models, which are known for capturing long-range dependencies effectively. Additionally, advanced feature fusion techniques to combine diverse data representations and few-shot learning methods to address limited data scenarios were not included in this study.
In future research, we plan to integrate transformer-based DL models to enhance the ability to capture intricate relationships within drug and target data. Incorporating advanced feature fusion techniques will enable the model to better leverage complementary information from diverse data sources. Furthermore, the use of few-shot learning methods will allow the framework to perform effectively in scenarios with limited data, making it more robust and generalizable across various drug discovery applications.
Acknowledgements
The authors would like to extend their sincere appreciation to the Ongoing Research Funding Program (ORF-2025-301), King Saud University, Riyadh, Saudi Arabia.
Author contributions
Md. Alamin Talukder: Conceptualization, Data curation, Methodology, Software, Resource, Visualization, Investigation, Formal Analysis, Supervision, Writing-original draft and review & editing. Mohsin Kazi: Methodology, Resource, Formal Analysis, Visualization, Investigation, Validation, Supervision, Writing-review & editing. Ammar Alazab: Methodology, Resource, Formal Analysis, Visualization, Investigation, Validation, Writing-review & editing.
Funding
This research project was supported by the Ongoing Research Funding Program (ORF-2025-301), King Saud University, Riyadh, Saudi Arabia.
Data availability
The selected datasets are sourced from free and open-access sources, such as DTI Data: https://tdcommons.ai/multi_pred_tasks/dti/#bindingdb.
Declarations
Conflict of interest
The authors have no conflicts of interest to declare that they are relevant to the content of this article.
Ethical approval
Not applicable
Consent to participate
Not applicable
Consent to Publish
Not applicable
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Change history
7/19/2025
The original online version of this Article was revised: The original version of this Article contained an error in Affiliation 2, which was incorrectly given as ‘Department of Pharmacognosy, College of Pharmacy, King Saud University, P.O. BOX-2457, Riyadh, 11451, Saudi Arabia’. The correct affiliation is ‘Department of Pharmaceutics, College of Pharmacy, King Saud University, P.O. Box-2457, Riyadh 11451, Saudi Arabia’. In addition, the grant number in the Acknowledgements and Funding section was incorrect. The correct Acknowledgements section now reads: “The authors would like to extend their sincere appreciation to the Ongoing Research Funding Program (ORF-2025-301), King Saud University, Riyadh, Saudi Arabia”. The correct Funding section now reads: “This research project was supported by the Ongoing Research Funding Program (ORF-2025-301), King Saud University, Riyadh, Saudi Arabia”.
Contributor Information
Md. Alamin Talukder, Email: alamin.cse@iubat.edu.
Mohsin Kazi, Email: mkazi@ksu.edu.sa.
Ammar Alazab, Email: ammar.alazab@torrens.edu.au.
References
- 1.Siddiqui, B., Yadav, C. S., Akil, M., Faiyyaz, M., Khan, A. R., Ahmad, N., Hassan, F., Azad, M. I., Owais, M., Nasibullah, M., et al. Artificial intelligence in computer-aided drug design (cadd) tools for the finding of potent biologically active small molecules: Traditional to modern approach. Combinatorial Chemistry & High Throughput Screening (2025) [DOI] [PubMed]
- 2.Marques, L. et al. Advancing precision medicine: A review of innovative in silico approaches for drug development, clinical pharmacology and personalized healthcare. Pharmaceutics16(3), 332 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Xing, Q., Cheng, W., Wang, W., Jin, C., & Wang, H.: Drivers of innovation value: simulation for new drug pricing evaluation based on system dynamics modelling. Frontiers in Pharmacology 16, (2025) [DOI] [PMC free article] [PubMed]
- 4.Qahwaji, R. et al. Pharmacogenomics: a genetic approach to drug development and therapy. Pharmaceuticals17(7), 940 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Uma, E., Mala, T., Geetha, A., & Priyanka, D. A comprehensive survey of drug-target interaction analysis in allopathy and siddha medicine. Artificial Intelligence in Medicine, 102986 (2024) [DOI] [PubMed]
- 6.Abdul Raheem, A. K. & Dhannoon, B. N. Comprehensive review on drug-target interaction prediction-latest developments and overview. Curr. Drug Discovery Technol.21(2), 56–67 (2024). [DOI] [PubMed] [Google Scholar]
- 7.Yang, Y., & Cheng, F. Artificial intelligence streamlines scientific discovery of drug–target interactions. British Journal of Pharmacology (2025) [DOI] [PubMed]
- 8.Kuo, D.-P. et al. Estimating the volume of penumbra in rodents using dti and stack-based ensemble machine learning framework. Eur. Radiol. Experim.8(1), 59 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Wei, B., Zhang, Y. & Gong, X. Deeplpi: a novel deep learning-based model for protein-ligand interaction prediction for drug repurposing. Sci. Rep.12(1), 18200 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Pei, Q., Wu, L., He, Z., Zhu, J., Xia, Y., Xie, S., & Yan, R. Exploiting pre-trained models for drug target affinity prediction with nearest neighbors. In: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, pp. 1856–1866 (2024)
- 11.Zhu, Z. et al. Drug-target binding affinity prediction model based on multi-scale diffusion and interactive learning. Expert Syst. Appl.255, 124647 (2024). [Google Scholar]
- 12.Schuh, M. G., Boldini, D., Bohne, A. I., & Sieber, S. A. Barlow twins deep neural network for advanced 1d drug-target interaction prediction. arXiv preprint arXiv:2408.00040 (2024) [DOI] [PMC free article] [PubMed]
- 13.Guichaoua, G., Pinel, P., Hoffmann, B., Azencott, C.-A., & Stoven, V. Advancing drug-target interactions prediction: Leveraging a large-scale dataset with a rapid and robust chemogenomic algorithm. bioRxiv, 2024–02 (2024)
- 14.Pei, Q., Wu, L., Zhu, J., Xia, Y., Xie, S., Qin, T., Liu, H., & Liu, T.-Y. Smt-dta: Improving drug-target affinity prediction with semi-supervised multi-task training. arXiv preprint arXiv:2206.09818 (2022)
- 15.Liu, T., Lin, Y., Wen, X., Jorissen, R. N., & Gilson, M. K. Bindingdb: a web-accessible database of experimentally determined protein–ligand binding affinities. Nucleic acids research 35(suppl_1), 198–201 (2007) [DOI] [PMC free article] [PubMed]
- 16.Talukder, M. A., Talaat, A. S. & Kazi, M. Hxai-ml: a hybrid explainable artificial intelligence based machine learning model for cardiovascular heart disease detection. Res. Eng.25, 104370 (2025). [Google Scholar]
- 17.Talukder, M. A., Hossen, R., Uddin, M. A., Uddin, M. N. & Acharjee, U. K. Securing transactions: A hybrid dependable ensemble machine learning model using iht-lr and grid search. Cybersecurity7(1), 32 (2024). [Google Scholar]
- 18.Talukder, M. A., Khalid, M. & Sultana, N. A hybrid machine learning model for intrusion detection in wireless sensor networks leveraging data balancing and dimensionality reduction. Sci. Rep.15(1), 4617 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The selected datasets are sourced from free and open-access sources, such as DTI Data: https://tdcommons.ai/multi_pred_tasks/dti/#bindingdb.













