DBP-iDWT: Improving DNA-Binding Proteins Prediction Using Multi-Perspective Evolutionary Profile and Discrete Wavelet Transform

Farman Ali; Omar Barukab; Ajay B Gadicha; Shruti Patil; Omar Alghushairy; Akram Y Sarhan

doi:10.1155/2022/2987407

. 2022 Sep 28;2022:2987407. doi: 10.1155/2022/2987407

DBP-iDWT: Improving DNA-Binding Proteins Prediction Using Multi-Perspective Evolutionary Profile and Discrete Wavelet Transform

Farman Ali ^1,^✉, Omar Barukab ², Ajay B Gadicha ³, Shruti Patil ⁴, Omar Alghushairy ⁵, Akram Y Sarhan ⁶

PMCID: PMC9534628 PMID: 36211019

Abstract

DNA-binding proteins (DBPs) have crucial biotic activities including DNA replication, recombination, and transcription. DBPs are highly concerned with chronic diseases and are used in the manufacturing of antibiotics and steroids. A series of predictors were established to identify DBPs. However, researchers are still working to further enhance the identification of DBPs. This research designed a novel predictor to identify DBPs more accurately. The features from the sequences are transformed by F-PSSM (Filtered position-specific scoring matrix), PSSM-DPC (Position specific scoring matrix-dipeptide composition), and R-PSSM (Reduced position-specific scoring matrix). To eliminate the noisy attributes, we extended DWT (discrete wavelet transform) to F-PSSM, PSSM-DPC, and R-PSSM and introduced three novel descriptors, namely, F-PSSM-DWT, PSSM-DPC-DWT, and R-PSSM-DWT. Onward, the training of the four models were performed using LiXGB (Light eXtreme gradient boosting), XGB (eXtreme gradient boosting, ERT (extremely randomized trees), and Adaboost. LiXGB with R-PSSM-DWT has attained 6.55% higher accuracy on training and 5.93% on testing dataset than the best existing predictors. The results reveal the excellent performance of our novel predictor over the past studies. DBP-iDWT would be fruitful for establishing more operative therapeutic strategies for fatal disease treatment.

1. Introduction

DNA-binding proteins perform many crucial activities like DNA translation, repair, translation, and damage [1]. DBPs are directly encoded into the genome of about 2–5% of the prokaryotic and 6–7% of eukaryotic [2]. Several DBPs are responsible for gene transcription and replication, and some DBPs shape the DNA into a specific structure, called chromatin [3]. The research on DBPs is significant in diverse fatal disease treatment and production of drugs. For instance, nuclear receptors are the key components of tamoxifen and bicalutamide medicines which are used in cancer treatment. Similarly, glucocorticoid receptors participate in the production of dexamethasone, which is utilized in autoimmune diseases and anti-inflammatory, allergies, and asthma treatment [4–6]. Onward, Inhibitor of DNA-binding (ID) proteins are closely related to tumor-associated processes including chemoresistance, tumorigenesis, and angiogenesis. In addition, ID proteins are also directly concerned with lung, cervical, and prostate cancers [7].

Protein sequences are rapidly growing in the online database. A series of predictors were developed for diverse biological problems including iRNA-PseTNC [8], iACP-GAEnsC [9], cACP-2LFS [10], DP-BINDER [11], Deep-AntiFP [12], cACP [13], iAtbP-Hyb-EnC [14], iAFPs-EnC-GA [15], and cACP-DeepGram [16]. It is highly demanding to predict DBPs by computational approaches. Several predictors were introduced using the primary sequential information and structural features. Structured-based predictors produce good prediction results, but structural features for all proteins are unavailable. Some of the structure-based protocols are iDBPs [17], DBD-Hunter [18], and Seq(DNA) [19]. Sequence-based systems have been developed using sequential information, more convenient and easy to employ for large datasets. Therefore, many sequence-based systems were adopted for DNA-binding proteins identification. Among these methods: DBP-DeepCNN [20], DNA-Prot [21], iDNA-Prot [22], iDNA-Prot|dis [23], Kmer1 + ACC [24], Local-DPP [25], DBPPred-PDSD [26], DPP-PseAAC [27], and StackDPPred [28]. Consequently, Li et al. extracted features by a convolutional neural network (CNN) and Bi-LSTM [29]. Onward, Zhao et al. the features of the proteins are analyzed by six methods and classification is performed with XGBoost [30]. Each computational method contributed well to enhancing the prediction of DBPs. However, more efforts are needed to improve prediction of DBPs. Considering this, a new method (DBP-iDWT) is established to identify DBPs accurately. The contribution of our research is as follows:

Designed three new feature descriptors i.e., F-PSSM-DWT, PSSM-DPC-DWT, and R-PSSM-DWT
LiXGB is applied for model training and prediction
Constructed a new computational model (DBP-iDWT) for improving DBPs identification

In addition to LiXGB, the features set is fed into three classification algorithms, namely ERT, XGB, and Adaboost. The efficacy of each classifier was assessed with ten-fold test, while the generalization capability was assessed by a testing set. LiXGB using R-PSSM-DWT secured the highest prediction outcomes than past methods. The flowchart of the DBP-iDWT is depicted in Figure 1.

The rest portion of the manuscript comprises three parts. Section 2 comprises details regarding datasets and methodologies; in Section 3, the performance of classifiers has illustrated; and Section 4 summarizes the conclusion.

2. Materials and Methods

2.1. Selection of Datasets

We selected two datasets from the previous work [31]. One dataset (PDB14189) is employed model training and the other dataset is deployed as a testing dataset. PDB14189 was collected from the UniProt database [32]. To design a standard dataset, they removed more than 25% of similar sequences by CD-HIT toolkit. The final training dataset comprises 7129 DBPs and 7060 non-DBPs. The independent set was retrieved by a procedure explained in reference [33]. The similar sequences with a cutoff value 25% are removed. The final testing dataset contains 1153 DBPs and 1119 non-DBPs.

2.2. Feature Descriptors

In this work, the patterns are discovered with PSSM-DPC-DWT, F-PSSM-DWT, and R-PSSM-DWT. These approaches are elaborated in the following parts.

2.2.1. Position-specific Scoring Matrix (PSSM)

Recently, evolutionary features are successfully implemented and improve the prediction results of many predictors [1, 20]. We also implemented PSSM for the formulation of evolutionary patterns. Each sequence is searched against the NCBI database applying the PSI-BLAST program for the alignment of homologous features [34].

The PSSM can be denoted as follows:

\begin{matrix} \begin{matrix} P S S M = {(P_{1}, P_{2}, \dots, P_{j}, \dots, P_{20})}^{T}, \\ P_{i, j} = (P_{1, j}, P_{2, j}, \dots, P_{L, j}), (i = 1,2, \dots, L), \end{matrix} \end{matrix}

(1)

where Tand P_i,j indicate the transpose operator and score of j type of amino acid in the i^th position of query sequence.

2.2.2. Filtered Position-specific Scoring Matrix (F-PSSM)

PSSM transforms the evolutionary patterns into numerical forms. It may comprise some negative scores which can lead to generating similar feature vectors despite different sequences. To cope with this hurdle, F-PSSM filters all the negative scores in the preprocessing step. The detail of dimension formulation is provided in [35].

2.2.3. Position-specific Scoring Matrix-Dipeptide Composition (PSSM-DPC)

The local sequence-order patterns contains informative feature which are explored by incorporating DPC into PSSM. DPC calculates the frequency of continuous amino acids and produces a dimension of 400 [36]. DPC is calculated as follows:

\begin{matrix} P S S M - D P C = {(G_{1,1}, {\dots, G}_{1,20}, G_{2,1}, \dots, G_{2,20}, \dots, G_{20,1}, \dots, G_{20,20})}^{T}, \end{matrix}

(2)

where

\begin{matrix} P_{i, j} = \frac{1}{L} \sum_{k = 1}^{L - 1} G_{k, i} \times G_{k + 1, j} (1 \leq i, j \leq 20) . \end{matrix}

(3)

2.2.4. Reduced Position Specific Scoring Matrix (R-PSSM)

It is believed that there exist several similarities among 20 unique amino acids. Based on these similarities, researchers categorized these residues into groups. Li et al. [37] suggested that according to some specific residue the following groups can be formed:

\begin{matrix} G (i) = \{\begin{matrix} Y, & i f i = F, Y, W; \\ L, & i f i = M, L; \\ V, & i f i = I, V; \\ S, & i f i = A, T, S; \\ N, & i f i = N, H; \\ E, & i f i = Q, E, D; \\ K, & i f i = R, K; \\ i, & o t h e r w i s e . \end{matrix} \end{matrix}

(4)

Using the Li et al. rule, the L × 20 PSSM is converted to L × 10 matrix by the following equations:

\begin{matrix} G_{1} = \frac{F + Y + W}{3}, \\ G_{2} = \frac{M + L}{2}, \\ G_{3} = \frac{I + V}{2}, \\ G_{4} = \frac{A + T + S}{3}, \\ G_{5} = \frac{N + H}{2}, \\ G_{6} = \frac{Q + E + D}{3}, \\ G_{7} = \frac{R + K}{2}, \\ G_{8} = C, \\ G_{9} = G, \\ G_{10} = P . \end{matrix}

(5)

If r₁r₂r₃ … ….r_L is a given protein sequence, then its reduced PSSM (R-PSSM) is indicated as follows:

\begin{matrix} R P = [\begin{matrix} 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ r_{1} & R_{1,1} & R_{1,2} & R_{1,3} & R_{1,4} & R_{1,5} & R_{1,6} & R_{1,7} & R_{1,8} & R_{1,9} & R_{1,10} \\ r_{2} & R_{2,1} & R_{2,2} & R_{2,3} & R_{2,4} & R_{2,5} & R_{2,6} & R_{2,7} & R_{2,8} & R_{2,9} & \begin{matrix} R_{2,10} \end{matrix} \\ \dots & \dots & \dots \\ r_{L} & R_{L, 1} & R_{L, 2} & R_{L, 3} & R_{L, 4} & R_{L, 5} & R_{L, 6} & R_{L, 7} & R_{L, 8} & R_{L, 9} & \begin{matrix} R_{L, 10} \end{matrix} \end{matrix}] . \end{matrix}

(6)

We obtain 110 feature vector from RP.

2.2.5. Discrete Wavelet Transform

To achieve only salient information, some compression approaches like DWT is applied in research areas. DWT is used for compression of signals and denoising [38, 39]. DWT divides a signal into low-frequency and high-frequency components [40]. Low frequencies are more important than high-frequencies [41]. The Low frequencies are onward split into low and high levels to achieve discriminative patterns. DWT is computed as follows:

\begin{matrix} X (m, n) = \sqrt{\frac{1}{m}} \int_{0}^{y} f (y) Ψ (\frac{y - n}{m}) d_{y}, \end{matrix}

(7)

where mrepresents the scale variable and n shows the translation variable. X(m, n) is the transform coefficient. The low and high frequencies of a signal f(t) is computed as follows:

\begin{matrix} \begin{matrix} C_{i, l o w} [a] = \sum_{k = 1}^{N} s [k] L [2 a - k], \\ C_{i, h i g h} [a] = \sum_{k = 1}^{N} s [k] H [2 a - k], \end{matrix} \end{matrix}

(8)

where C_i,high[a] and C_i,low[a] are the high and low frequencies of the signal. H, s[k], and L, represent the high pass filter, discrete signal, and low pass filter, respectively.

To obtain only important features and eliminate the less informative and noisy patterns, DWT is extended into F-PSSM, PSSM-DPC, and R-PSSM to split into low and high frequencies up to two levels. Finally, PSSM-DPC-DWT, R-PSSM-DWT, and F-PSSM-DWT novel feature descriptors are constructed. The dimension of each feature set is 512 after applying DWT. Figure 2 depicts the schematic view of Two-level DWT.

2.3. Light eXtreme Gradient Boosting

During the establishment of the predictor, the model training is performed by a classifier. Gradient Boosting Machine (GBM) classifier uses decision trees for the construction of a model. The model performance is improved with loss function [42]. Unlike GBM, eXtreme Gradient Boosting (XGB) employs an objective function. XGB concatenates loss function and regularization for regulating the model complexity. It performs parallel computations to optimize the computational speed. Due to these benefits of XGB, Light eXtreme Gradient Boosting (LiXGB) was proposed [43]. LiXGB possesses many additional features like lower memory, higher efficiency, and fast model training speed that improve the model performance. LiXGB minimizes the model training time of the large datasets. We utilized the hyperparameters like max depth, estimator, eta, lambda, and alpha. The “eta” maintains the learning rate, “estimator” constructs trees, “max depth” is used for controlling the tree depth, “alpha” shrinks the high dimension of the dataset, and “lambda” avoids the overfitting. Other parameters have been kept as default. These hyperparameters are also summarized in Table 1.

Table 1.

Applied parameters with values.

Parameter	Value
Era	0.1
No. of estimator	500
Alpha	1
Lambda	1
Max depth	8

Open in a new tab

2.4. Proposed Model Validation Methodologies

The model performance is examined by different validation approaches The commonly used validation methods are k-fold and jackknife [44–47]. However, the jackknife is time-consuming and costly [48–50]. During 10-fold cross validation, training set is split into 10-folds. The 9 folds are used for model training and 1 fold is used for model validation. This process is repeated 10 times so that each fold is used for the test exactly once. The final prediction is the average of all tested folds [51–54]. The current work performance is evaluated with 10-fold and five indexes, i.e., specificity (Sp), F-measure, sensitivity (Sn), accuracy (Acc), and Mathew's correlation coefficient (MCC) for evaluating the model performance [55–58]. These parameters are computed as follows:

\begin{matrix} A c c = 1 - \frac{H_{-}^{+} + H_{+}^{-}}{H^{+} + H^{-}}, \\ S n = 1 - \frac{H_{-}^{+}}{H^{+}}, \\ S p = 1 - \frac{H_{+}^{-}}{H^{-}}, \\ M C C = \frac{1 - (H_{-}^{+} + H_{+}^{-} / H^{+} + H^{-})}{\sqrt{(1 + (H_{-}^{+} + H_{+}^{-} / H^{+})) (1 + (H_{-}^{+} + H_{+}^{-} / H^{-}))}}, \\ F 1 S c o r e = \frac{(2 * p r e c i s i o n * r e c a l l)}{(p r e c i s i o n + r e c a l l)}, \\ P r e c i s i o n = \frac{H^{+}}{H_{-}^{+} + H^{+}}, \\ R e c a l l = \frac{H^{+}}{H_{+}^{-} + H^{+}}, \end{matrix}

(9)

where H⁺ is used to denote the DBPs, H⁻ is the non-DBPs, H₊⁻ shows the prediction of non-DBPs which the model predicted mistakenly as DBPs, and H₋⁺ represents the DBPs which are classified by the model as non-DBPs.

3. Results and Discussion

After performing experiments on the models, In this part, we will elaborate the obtained results of the learning algorithms via the extracted feature sets of the training and testing sequences.

3.1. Results of Feature Encoders before DWT

In this section, we have reported the outcomes of F-PSSM, PSSM-DPC, and R-PSSM in Table 2. The performance of the individual descriptor is analyzed by 10-fold test and assessment indices. On F-PSSM, the accuracies secured by LiXGB, XGB, ERT, and Adaboost are 76.60%, 74.57%, 75.18%, and 71.52%, respectively. Among all classifiers, LiXGB achieved the best accuracy. On PSSM-DPC, all classifiers enhanced the prediction results and generated 83.62%, 81.63%, 79.56%, and 80.07% accuracies by LiXGB, XGB, ERT, and Adaboost, respectively. Similarly, the classifiers also improved the performance on the R-PSSM descriptor using all evaluation parameters. LiXGB attained the highest (83.62%) accuracy. The predictions indicate that LiXGB possesses higher learning power comparatively XGB, ERT, and Adaboost.

Table 2.

Results of encoders before DWT.

Model	Encoder	Acc (%)	Sn (%)	Sp (%)	MCC (%)
Adaboost	F-PSSM	71.52	80.42	62.54	43.67
	PSSM-DPC	80.05	78.44	81.69	60.15
	R-PSSM	80.07	76.15	84.02	60.35

ERT	F-PSSM	75.18	84.74	58.97	44.56
	PSSM-DPC	79.22	73.18	85.31	58.42
	R-PSSM	79.56	74.99	84.18	59.40

XGB	F-PSSM	74.57	82.17	66.90	49.67
	PSSM-DPC	81.53	76.15	86.97	63.47
	R-PSSM	81.63	76.48	86.84	63.64

LiXGB	F-PSSM	76.60	82.47	66.01	48.75
	PSSM-DPC	83.54	84.61	82.46	67.10
	R-PSSM	83.62	82.30	84.96	67.27

Open in a new tab

3.2. Results of Feature Encoders after DWT

The features extracted by representative methods may contain some noisy, redundant, or less informative features. To avoid such features, DWT is applied to F-PSSM, PSSM-DPC, and R-PSSM. DWT considers the informative patterns and improves the performance of the model. After applying DWT, we achieve F-PSSM-DWT, PSSM-DPC-DWT, and R-PSSM-DWT. Each feature is fed into Adaboost, ERT, XGB, and LiXGB in order to examine the performance over these feature descriptors and results are summarized in Table 3. With 10-fold test, Adaboost, ERT, XGB, and LiXGB produced 73.20%, 77.26%, 75.37%, and 79.40% accuracies which are 1.68%, 2.08%, 0.80%, and 2.80% than F-PSSM, PSSM-DPC, and R-PSSM, respectively. Similarly, the classifiers also boosted the performance on PSSM-DPC-DWT on all evaluation parameters. Furthermore, with R-PSSM-DWT, Adaboost, ERT, XGB, and LiXGB have enhanced the accuracies by 2.16%, 3.49%, 1.98%, and 3.22% than R-PSSM. These results demonstrate that all classifiers show improvement in performance after applying DWT. Among all feature descriptors, the best results are secured by R-PSSM-DWT.

Table 3.

Results of feature encoders after DWT.

Model	Encoder	Acc (%)	Sn (%)	Sp (%)	MCC (%)
Adaboost	F-PSSM-DWT	73.20	82.35	56.67	40.21
	PSSM-DPC-DWT	81.81	80.45	83.19	63.66
	R-PSSM-DWT	82.23	77.68	86.81	64.77

ERT	F-PSSM-DWT	77.26	79.91	74.59	54.58
	PSSM-DPC-DWT	81.53	76.15	86.97	63.47
	R-PSSM-DWT	83.05	81.30	84.82	66.15

XGB	F-PSSM-DWT	75.37	83.43	60.81	45.31
	PSSM-DPC-DWT	82.45	83.65	81.25	64.91
	R-PSSM-DWT	83.61	82.66	84.56	67.23

LiXGB	F-PSSM-DWT	79.40	83.11	75.65	58.94
	PSSM-DPC-DWT	84.74	84.30	85.19	69.49
	R-PSSM-DWT	86.84	86.60	87.08	73.69

Open in a new tab

LiXGB has constantly depicted better achievement than other classifiers. LiXGB enhanced the performance and generated 3.23%, 3.79%, and 4.61% higher accuracies than XGB, ERT, and Adaboost with R-PSSM-DWT. It is concluded that the performance of LiXGB is superior to other classifiers.

3.3. Comparison with Existing Predictors Using Training Set

Several methods have been implemented for the identification of DBPs. The proposed work is compared with past studies including iDNA-Prot [22], iDNA-Prot|dis [23], TargetDBP [59], MsDBP [60], PDBP-CNN [29], and XGBoost [30] and summarized the results in Table 4. Our proposed study improved the accuracy by 4.82%, sensitivity by 10.58%, and MCC by 0.09 than the best predictor (PDBP-CNN). Similarly, The DBP-iDWT enhanced 5.42% Acc, 2.49% Sn, 8.65% Sp, and 0.11 MCC than the second best study (XGBoost). In the same fashion, our predictor performance is superior to past studies using all four assessment parameters. The outcomes verified that DBP-iDWT can discriminate DBPs with high precision.

Table 4.

Comparative analysis with past work on the training set.

Predictor	Acc (%)	Sn (%)	Sp (%)	MCC
iDNA-prot	75.40	83.81	64.73	0.50
iDNA-prot\|dis	77.30	79.40	75.27	0.54
TargetDBP	79.71	79.56	79.85	0.59
MsDBP	80.29	80.87	79.72	0.60
PDBP-CNN	82.02	87.49	76.50	0.64
XGBoost	81.42	84.11	78.43	0.62
DBP-iDWT	86.84	86.60	87.08	0.73

Open in a new tab

3.4. Comparison with Past Predictors Using Independent Set

A method is considered effective if it has high generality for the new sequences. We also evaluated the proposed work using a testing dataset. The results compared with past studies like PseDNA-Pro, iDNAPro-PseAAC, iDNAProt-ES, DPP-PseAAC, TargetDBP, MsDBP, and PDBP-Fusion as noted in Table 5. It is noted that our predictor (DBP-iDWT) raised 5.06% Acc, 17.06% Sn, 8.22% Sp, and 0.10 MCC than PDBP-Fusion. Similarly, DBP-iDWT improved 6.14% Acc, 14.02% Sn, and 0.13 MCC than TargetDBP. Onward, the proposed study also secured higher prediction results than other past methods in Table 5.

Table 5.

Comparative analysis with past work using testing dataset.

Predictor	Acc (%)	Sn (%)	Sp (%)	MCC
PseDNA-pro	67.23	78.38	56.08	0.35
iDNAPro-PseAAC	66.22	78.37	54.05	0.33
iDNAProt-ES	68.58	95.95	41.22	0.44
DPP-PseAAC	61.15	55.41	66.89	0.22
TargetDBP	76.69	76.35	77.03	0.53
MsDBP	66.99	70.69	63.18	0.33
PDBP-fusion	77.77	73.31	66.85	0.56
DBP-iDWT	82.83	90.37	75.07	0.66

Open in a new tab

These results analysis confirm that the incorporation of DWT into R-PSSM in conjunction with LiXGB can identify DBPs more accurately. Past studies have reported that the selection of the best features can improve the model performance [61–63]. In this study, we also implemented feature selection approach including mRmR and SVM-RFE, however, no improvement in the model performance is observed.

4. Conclusion and Future Vision

DBPs play an active role in many biological functions and drug designing. We have designed a predictor for improving DBPs prediction with high precision. The global information, local features, sequence-order patterns, and correlated factors are explored by PSSM-DPC-DWT, R-PSSM-DWT, and PSSM-DPC-DWT.

The models are trained with LiXGB, XGB, ERT, and Adaboost. It is concluded that R-PSSM-DWT with LiXGB has effectively attained superlative performance than other predictors. The successful outcomes of the proposed study is due to factors like utilization of effective descriptors, application of a compression scheme, and appropriate classifier.

DBP-iDWT will be effective for the identification of DBPs due to its promising prediction power than other predictors and perform an active role in drug development. DBP-iDWT would be fruitful for establishing more operative therapeutic strategies for fatal disease treatment. In addition, we will apply advanced deep learning frameworks [64–67] in our future work to further improve the DBPs prediction.

Acknowledgments

The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work under Grant no. RGP.2/198/43.

Data Availability

The data and code are freely available at https://github.com/Farman335/DBP-DWTPred.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Authors' Contributions

Farman Ali: Conceptualization, Methodology. Harish Kumar and Shruti Patil: data collection, writing-original draft preparation. Omar Barukab and Ajay B Gadicha: Visualization, performed experiments suggested by reviewers. Omar Alghushairy and Akram Y Sarhan: Code writing, editing, and reviewed the paper.

References

1.Ahmed S., Kabir M., Ali Z., Arif M., Ali F., Yu D.-J. An integrated feature selection algorithm for cancer classification using gene expression data. Combinatorial Chemistry and High Throughput Screening . 2018;21:631–645. doi: 10.2174/1386207322666181220124756. [DOI] [PubMed] [Google Scholar]
2.Luscombe N. M., Austin S. E., Berman H. M., Thornton J. M. An overview of the structures of protein-DNA complexes. Genome Biology . 2000;1 doi: 10.1186/gb-2000-1-1-reviews001. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Sandman K., Pereira S. L., Reeve J. N. Diversity of prokaryotic chromosomal proteins and the origin of the nucleosome. Cellular and Molecular Life Sciences . 1998;54(12):1350–1364. doi: 10.1007/s000180050259. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Overington J. P., Al-Lazikani B., Hopkins A. L. How many drug targets are there. Nature Reviews Drug Discovery . 2006;5(12):993–996. doi: 10.1038/nrd2199. [DOI] [PubMed] [Google Scholar]
5.Gronemeyer H., Gustafsson J. A, Laudet V. Principles for modulation of the nuclear receptor superfamily. Nature Reviews Drug Discovery . 2004;3(11):950–964. doi: 10.1038/nrd1551. [DOI] [PubMed] [Google Scholar]
6.Hudson W. H., Vera I. M. S. d., Nwachukwu J. C., et al. Cryptic glucocorticoid receptor-binding sites pervade genomic NF-κB response elements. Nature Communications . 2018;9(1):p. 1337. doi: 10.1038/s41467-018-03780-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Sikder H. A., Devlin M. K., Dunlap S., Ryu B., Alani R. M. Id proteins in cell growth and tumorigenesis. Cancer Cell . 2003;3(6):525–530. doi: 10.1016/s1535-6108(03)00141-7. [DOI] [PubMed] [Google Scholar]
8.Akbar S., Hayat M., Iqbal M., Tahir M. iRNA-PseTNC: identification of RNA 5-methylcytosine sites using hybrid vector space of pseudo nucleotide composition. Frontiers of Computer Science . 2020;14(2):451–460. doi: 10.1007/s11704-018-8094-9. [DOI] [Google Scholar]
9.Akbar S., Hayat M., Iqbal M., Jan M. A. iACP-GAEnsC: evolutionary genetic algorithm based ensemble classification of anticancer peptides by utilizing hybrid feature space. Artificial Intelligence in Medicine . 2017;79:62–70. doi: 10.1016/j.artmed.2017.06.008. [DOI] [PubMed] [Google Scholar]
10.Akbar S., Hayat M., Tahir M., Chong K. T. cACP-2LFS: classification of anticancer peptides using sequential discriminative model of KSAAP and two-level feature selection approach. IEEE Access . 2020;8:131939–131948. doi: 10.1109/access.2020.3009125. [DOI] [Google Scholar]
11.Ali F., Ahmed S., Swati Z. N. K., Akbar S. DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information. Journal of Computer-Aided Molecular Design . 2019;33(7):645–658. doi: 10.1007/s10822-019-00207-x. [DOI] [PubMed] [Google Scholar]
12.Ahmad A., Akbar S., Khan S., et al. Deep-AntiFP: prediction of antifungal peptides using distanct multi-informative features incorporating with deep neural networks. Chemometrics and Intelligent Laboratory Systems . 2021;208 doi: 10.1016/j.chemolab.2020.104214.104214 [DOI] [Google Scholar]
13.Akbar S., Rahman A. U., Hayat M., Sohail M. CACP: classifying anticancer peptides using discriminative intelligent model via Chou’s 5-step rules and general pseudo components. Chemometrics and Intelligent Laboratory Systems . 2020;196 doi: 10.1016/j.chemolab.2019.103912.103912 [DOI] [Google Scholar]
14.Akbar S., Ahmad A., Hayat M., Rehman A. U., Khan S., Ali F. iAtbP-hyb-EnC: prediction of antitubercular peptides via heterogeneous feature representation and genetic algorithm based ensemble learning model. Computers in Biology and Medicine . 2021;137 doi: 10.1016/j.compbiomed.2021.104778.104778 [DOI] [PubMed] [Google Scholar]
15.Ahmad A., Akbar S., Tahir M., Hayat M., Ali F. iAFPs-EnC-GA: Identifying Antifungal Peptides Using Sequential and Evolutionary Descriptors Based Multi-Information Fusion and Ensemble Learning Approach. Chemometrics and Intelligent Laboratory Systems . 2022;222 doi: 10.1016/j.chemolab.2022.104516.104516 [DOI] [Google Scholar]
16.Akbar S., Hayat M., Tahir M., Khan S., Alarfaj F. K. cACP-DeepGram: classification of anticancer peptides via deep neural network and skip-gram-based word embedding model. Artificial Intelligence in Medicine . 2022;131 doi: 10.1016/j.artmed.2022.102349.102349 [DOI] [PubMed] [Google Scholar]
17.Nimrod G., Schushan M., Szilágyi A., Leslie C., Ben-Tal N. iDBPs: a web server for the identification of DNA binding proteins. Bioinformatics . 2010;26(5):692–693. doi: 10.1093/bioinformatics/btq019. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Gao M., Skolnick J. DBD-Hunter: a knowledge-based method for the prediction of DNA–protein interactions. Nucleic Acids Research . 2008;36(12):3978–3992. doi: 10.1093/nar/gkn332. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Zhao H., Wang J., Zhou Y., Yang Y. Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome. PLoS One . 2014;9(5) doi: 10.1371/journal.pone.0096694.e96694 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Ali F., Kumar H., Patil S., Ahmed A., Banjar A., Daud A. DBP-DeepCNN: Prediction of DNA-Binding Proteins Using Wavelet-Based Denoising and Deep Learning. Chemometrics and Intelligent Laboratory Systems . 2022;229 doi: 10.1016/j.chemolab.2022.104639.104639 [DOI] [Google Scholar]
21.Kumar K. K., Pugalenthi G., Suganthan P. N. DNA-Prot: identification of DNA binding proteins from protein sequence information using random forest. Journal of Biomolecular Structure and Dynamics . 2009;26(6):679–686. doi: 10.1080/07391102.2009.10507281. [DOI] [PubMed] [Google Scholar]
22.Lin W.-Z., Fang J.-A., Xiao X., Chou K.-C. IDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS One . 2011;6(9) doi: 10.1371/journal.pone.0024756.e24756 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Liu B., Xu J., Lan X., et al. IDNA-Prot| dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS One . 2014;9 doi: 10.1371/journal.pone.0106691.e106691 [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Dong Q., Wang S., Wang K., Liu X., Liu B. Identification of DNA-binding proteins by auto-cross covariance transformation. Proceedings of the Bioinformatics and Biomedicine (BIBM), 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); November 2015; Washington DC, USA. pp. 470–475. [Google Scholar]
25.Wei L., Tang J., Zou Q. Local-DPP: an improved DNA-binding protein prediction method by exploring local evolutionary information. Information Sciences . 2017;384:135–144. doi: 10.1016/j.ins.2016.06.026. [DOI] [Google Scholar]
26.Ali F., Kabir M., Arif M., et al. DBPPred-PDSD: machine learning approach for prediction of DNA-binding proteins using Discrete Wavelet Transform and optimized integrated features space. Chemometrics and Intelligent Laboratory Systems . 2018;182:21–30. doi: 10.1016/j.chemolab.2018.08.013. [DOI] [Google Scholar]
27.Rahman M. S., Shatabda S., Saha S., Kaykobad M., Rahman M. S. DPP-PseAAC: a DNA-binding protein prediction model using Chou’s general PseAAC. Journal of Theoretical Biology . 2018;452:22–34. doi: 10.1016/j.jtbi.2018.05.006. [DOI] [PubMed] [Google Scholar]
28.Mishra A., Pokhrel P., Hoque M. T. StackDPPred: a stacking based prediction of DNA-binding protein from sequence. Bioinformatics . 2018;35(3):433–441. doi: 10.1093/bioinformatics/bty653. [DOI] [PubMed] [Google Scholar]
29.Li G., Du X., Li X., Zou L., Zhang G., Wu Z. Prediction of DNA binding proteins using local features and long-term dependencies with primary sequences based on deep learning. PeerJ . 2021;9 doi: 10.7717/peerj.11262.e11262 [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Zhao Z., Yang W., Zhai Y., Liang Y., Zhao Y. Identify DNA-binding proteins through the extreme gradient boosting algorithm. Frontiers in Genetics . 2021;12 doi: 10.3389/fgene.2021.821996.821996 [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Du X., Diao Y., Liu H., Li S. MsDBP: exploring DNA-binding proteins by integrating multi-scale sequence information via chou’s 5-steps rule. J Proteome res . 2019;18 doi: 10.1021/acs.jproteome.9b00226. [DOI] [PubMed] [Google Scholar]
32.Ma X., Guo J., Sun X. DNABP: identification of DNA-binding proteins based on feature selection using a random forest and predicting binding residues. PLoS One . 2016;11(12) doi: 10.1371/journal.pone.0167345.e0167345 [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Zou C., Gong J., Li H. An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis. BMC Bioinformatics . 2013;14(1):p. 90. doi: 10.1186/1471-2105-14-90. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Altschul S. F., Madden T. L., Schäffer A. A., et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research . 1997;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Zahiri J., Yaghoubi O., Mohammad-Noori M., Ebrahimpour R., Masoudi-Nejad A. PPIevo: protein–protein interaction prediction from PSSM based evolutionary information. Genomics . 2013;102(4):237–242. doi: 10.1016/j.ygeno.2013.05.006. [DOI] [PubMed] [Google Scholar]
36.Ali F., Hayat M. Machine learning approaches for discrimination of Extracellular Matrix proteins using hybrid feature space. Journal of Theoretical Biology . 2016;403:30–37. doi: 10.1016/j.jtbi.2016.05.011. [DOI] [PubMed] [Google Scholar]
37.Li T., Fan K., Wang J., Wang W. Reduction of protein sequence complexity by residue grouping. Protein Engineering Design and Selection . 2003;16(5):323–330. doi: 10.1093/protein/gzg044. [DOI] [PubMed] [Google Scholar]
38.Moshrefi R., Mahjani M. G., Jafarian M. Application of wavelet entropy in analysis of electrochemical noise for corrosion type identification. Electrochemistry Communications . 2014;48:49–51. doi: 10.1016/j.elecom.2014.08.005. [DOI] [Google Scholar]
39.Wang X., Wang J., Fu C., Gao Y. Determination of corrosion type by wavelet-based fractal dimension from electrochemical noise. International Journal of Electrochemical Science . 2013;8:7211–7222. [Google Scholar]
40.Yu B., Li S., Chen C., et al. Prediction subcellular localization of Gram-negative bacterial proteins by support vector machine using wavelet denoising and Chou’s pseudo amino acid composition. Chemometrics and Intelligent Laboratory Systems . 2017;167:102–112. doi: 10.1016/j.chemolab.2017.05.009. [DOI] [Google Scholar]
41.Hayat M., Khan A., Yeasin M. Prediction of membrane proteins using split amino acid and ensemble classification. Amino Acids . 2012;42(6):2447–2460. doi: 10.1007/s00726-011-1053-5. [DOI] [PubMed] [Google Scholar]
42.Ma B., Meng F., Yan G., Yan H., Chai B., Song F. Diagnostic classification of cancers using extreme gradient boosting algorithm and multi-omics data. Computers in Biology and Medicine . 2020;121 doi: 10.1016/j.compbiomed.2020.103761.103761 [DOI] [PubMed] [Google Scholar]
43.Wang X., Zhang Y., Yu B., et al. Prediction of protein-protein interaction sites through eXtreme gradient boosting with kernel principal component analysis. Computers in Biology and Medicine . 2021;134 doi: 10.1016/j.compbiomed.2021.104516.104516 [DOI] [PubMed] [Google Scholar]
44.Akbar S., Khan S., Ali F., Hayat M., Qasim M., Gul S. iHBP-DeepPSSM: identifying hormone binding proteins using PsePSSM based evolutionary features and deep learning approach. Chemometrics and Intelligent Laboratory Systems . 2020;204 doi: 10.1016/j.chemolab.2020.104103.104103 [DOI] [Google Scholar]
45.Ali F., Akbar S., Ghulam A., Maher Z. A., Unar A., Talpur D. B. AFP-CMBPred: computational identification of antifreeze proteins by extending consensus sequences into multi-blocks evolutionary information. Computers in Biology and Medicine . 2021;139 doi: 10.1016/j.compbiomed.2021.105006.105006 [DOI] [PubMed] [Google Scholar]
46.Khan I. A., Pi D., Khan N., et al. A Privacy-Conserving Framework Based Intrusion Detection Method for Detecting and Recognizing Malicious Behaviours in Cyber-Physical Power Networks. Applied Intelligence . 2021;51:1–16. doi: 10.1007/s10489-021-02222-8. [DOI] [Google Scholar]
47.Chaudhari P., Agrawal H., Kotecha K. Data augmentation using MG-GAN for improved cancer classification on gene expression data. Soft Computing . 2020;24(15):11381–11391. doi: 10.1007/s00500-019-04602-2. [DOI] [Google Scholar]
48.Ali F., Hayat M. Classification of membrane protein types using voting feature interval in combination with chou׳ s pseudo amino acid composition. Journal of Theoretical Biology . 2015;384:78–83. doi: 10.1016/j.jtbi.2015.07.034. [DOI] [PubMed] [Google Scholar]
49.Ali F., Kumar H., Patil S., Ahmad A., Babour A., Daud A. Deep-GHBP: improving prediction of Growth Hormone-binding proteins using deep learning model. Biomedical Signal Processing and Control . 2022;78 doi: 10.1016/j.bspc.2022.103856.103856 [DOI] [Google Scholar]
50.Ali F., Kumar H., Patil S., Kotecha K., Banjar A., Daud A. Target-DBPPred: an intelligent model for prediction of DNA-binding proteins using discrete wavelet transform based compression and light eXtreme gradient boosting. Computers in Biology and Medicine . 2022;145 doi: 10.1016/j.compbiomed.2022.105533.105533 [DOI] [PubMed] [Google Scholar]
51.Barukab O., Ali F., Alghamdi W., Bassam Y., Khan S. A. DBP-CNN: deep learning-based prediction of DNA-binding proteins by coupling discrete cosine transform with two-dimensional convolutional neural network. Expert Systems with Applications . 2022;197 doi: 10.1016/j.eswa.2022.116729.116729 [DOI] [Google Scholar]
52.Barukab O., Ali F., Khan S. A. DBP-GAPred: an intelligent method for prediction of DNA-binding proteins types by enhanced evolutionary profile features with ensemble learning. Journal of Bioinformatics and Computational Biology . 2021;19(04) doi: 10.1142/s0219720021500189.2150018 [DOI] [PubMed] [Google Scholar]
53.Ghulam A., Ali F., Sikander R., Ahmad A., Ahmed A., Patil S. ACP-2DCNN: deep learning-based model for improving prediction of anticancer peptides using two-dimensional convolutional neural network. Chemometrics and Intelligent Laboratory Systems . 2022;226 doi: 10.1016/j.chemolab.2022.104589.104589 [DOI] [Google Scholar]
54.Ghulam A., Sikander R., Ali F., Swati Z. N. K., Unar A., Talpur D. B. Accurate prediction of immunoglobulin proteins using machine learning model. Informatics in Medicine Unlocked . 2022;29 doi: 10.1016/j.imu.2022.100885.100885 [DOI] [Google Scholar]
55.Khan Z. U., Ali F., Ahmad I., Hayat M., Pi D. iPredCNC: computational prediction model for cancerlectins and non-cancerlectins using novel cascade features subset selection. Chemometrics and Intelligent Laboratory Systems . 2019;195 doi: 10.1016/j.chemolab.2019.103876.103876 [DOI] [Google Scholar]
56.Khan Z. U., Ali F., Khan I. A., Hussain Y., Pi D. iRSpot-SPI: deep learning-based recombination spots prediction by incorporating secondary sequence information coupled with physio-chemical properties via Chou’s 5-step rule and pseudo components. Chemometrics and Intelligent Laboratory Systems . 2019;189:169–180. doi: 10.1016/j.chemolab.2019.05.003. [DOI] [Google Scholar]
57.Khan Z. U., Pi D., Yao S., Nawaz A., Ali F., Ali S. piEnPred: a bi-layered discriminative model for enhancers and their subtypes via novel cascade multi-level subset feature selection algorithm. Frontiers of Computer Science . 2021;15(6):156904–156911. doi: 10.1007/s11704-020-9504-3. [DOI] [Google Scholar]
58.Ullah M., Iltaf A., Hou Q., Ali F., Liu C. A foreground extraction approach using convolutional neural network with graph cut. Proceedings of the 2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC); June 2018; Chongqing, China. pp. 40–44. [DOI] [Google Scholar]
59.Hu J., Zhou X.-G., Zhu Y.-H., Yu D.-J., Zhang G.-J. TargetDBP: accurate DNA-binding protein prediction via sequence-based multi-view feature learning. IEEE/ACM Transactions on Computational Biology and Bioinformatics . 2020 Jul-Aug;17(4):1419–1429. doi: 10.1109/tcbb.2019.2893634. [DOI] [PubMed] [Google Scholar]
60.Du X., Diao Y., Liu H., Li S. MsDBP: exploring DNA-binding proteins by integrating multiscale sequence information via Chou’s five-step rule. Journal of Proteome Research . 2019;18(8):3119–3132. doi: 10.1021/acs.jproteome.9b00226. [DOI] [PubMed] [Google Scholar]
61.Sanghani G., Kotecha K. Incremental personalized E-mail spam filter using novel TFDCR feature selection with dynamic feature update. Expert Systems with Applications . 2019;115:287–299. doi: 10.1016/j.eswa.2018.07.049. [DOI] [Google Scholar]
62.Ahmad A., Akbar S., Hayat M., Ali F., Sohail M. Identification of Antioxidant Proteins Using a Discriminative Intelligent Model of K-Space Amino Acid Pairs Based Descriptors Incorporating with Ensemble Feature Selection. Biocybernetics and Biomedical Engineering . 2020;42 doi: 10.1016/j.bbe.2020.10.003. [DOI] [Google Scholar]
63.Akbar S., Hayat M., Kabir M., Iqbal M. iAFP-gap-SMOTE: an efficient feature extraction scheme gapped dipeptide composition is coupled with an oversampling technique for identification of antifreeze proteins. Letters in Organic Chemistry . 2019;16(4):294–302. doi: 10.2174/1570178615666180816101653. [DOI] [Google Scholar]
64.Joshi G., Walambe R., Kotecha K. A review on explainability in multimodal deep neural nets. IEEE Access . 2021;9:59800–59821. doi: 10.1109/access.2021.3070212. [DOI] [Google Scholar]
65.Ali F., Arif M., Khan Z. U., Kabir M., Ahmed S., Yu D.-J. SDBP-Pred: prediction of single-stranded and double-stranded DNA-binding proteins by extending consensus sequence and K-segmentation strategies into PSSM. Analytical Biochemistry . 2020;589 doi: 10.1016/j.ab.2019.113494.113494 [DOI] [PubMed] [Google Scholar]
66.Ali F., Ali F., Ghulam A., et al. Deep-PCL: a deep learning model for prediction of cancerlectins and non cancerlectins using optimized integrated features. Chemometrics and Intelligent Laboratory Systems . 2022;221 doi: 10.1016/j.chemolab.2021.104484.104484 [DOI] [Google Scholar]
67.Sikander R., Ghulam A., Ali F. XGB-DrugPred: computational prediction of druggable proteins using eXtreme gradient boosting and optimized features set. Scientific Reports . 2022;12:5505–5509. doi: 10.1038/s41598-022-09484-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data and code are freely available at https://github.com/Farman335/DBP-DWTPred.

[B1] 1.Ahmed S., Kabir M., Ali Z., Arif M., Ali F., Yu D.-J. An integrated feature selection algorithm for cancer classification using gene expression data. Combinatorial Chemistry and High Throughput Screening . 2018;21:631–645. doi: 10.2174/1386207322666181220124756. [DOI] [PubMed] [Google Scholar]

[B2] 2.Luscombe N. M., Austin S. E., Berman H. M., Thornton J. M. An overview of the structures of protein-DNA complexes. Genome Biology . 2000;1 doi: 10.1186/gb-2000-1-1-reviews001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] 3.Sandman K., Pereira S. L., Reeve J. N. Diversity of prokaryotic chromosomal proteins and the origin of the nucleosome. Cellular and Molecular Life Sciences . 1998;54(12):1350–1364. doi: 10.1007/s000180050259. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] 4.Overington J. P., Al-Lazikani B., Hopkins A. L. How many drug targets are there. Nature Reviews Drug Discovery . 2006;5(12):993–996. doi: 10.1038/nrd2199. [DOI] [PubMed] [Google Scholar]

[B5] 5.Gronemeyer H., Gustafsson J. A, Laudet V. Principles for modulation of the nuclear receptor superfamily. Nature Reviews Drug Discovery . 2004;3(11):950–964. doi: 10.1038/nrd1551. [DOI] [PubMed] [Google Scholar]

[B6] 6.Hudson W. H., Vera I. M. S. d., Nwachukwu J. C., et al. Cryptic glucocorticoid receptor-binding sites pervade genomic NF-κB response elements. Nature Communications . 2018;9(1):p. 1337. doi: 10.1038/s41467-018-03780-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7.Sikder H. A., Devlin M. K., Dunlap S., Ryu B., Alani R. M. Id proteins in cell growth and tumorigenesis. Cancer Cell . 2003;3(6):525–530. doi: 10.1016/s1535-6108(03)00141-7. [DOI] [PubMed] [Google Scholar]

[B8] 8.Akbar S., Hayat M., Iqbal M., Tahir M. iRNA-PseTNC: identification of RNA 5-methylcytosine sites using hybrid vector space of pseudo nucleotide composition. Frontiers of Computer Science . 2020;14(2):451–460. doi: 10.1007/s11704-018-8094-9. [DOI] [Google Scholar]

[B9] 9.Akbar S., Hayat M., Iqbal M., Jan M. A. iACP-GAEnsC: evolutionary genetic algorithm based ensemble classification of anticancer peptides by utilizing hybrid feature space. Artificial Intelligence in Medicine . 2017;79:62–70. doi: 10.1016/j.artmed.2017.06.008. [DOI] [PubMed] [Google Scholar]

[B10] 10.Akbar S., Hayat M., Tahir M., Chong K. T. cACP-2LFS: classification of anticancer peptides using sequential discriminative model of KSAAP and two-level feature selection approach. IEEE Access . 2020;8:131939–131948. doi: 10.1109/access.2020.3009125. [DOI] [Google Scholar]

[B11] 11.Ali F., Ahmed S., Swati Z. N. K., Akbar S. DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information. Journal of Computer-Aided Molecular Design . 2019;33(7):645–658. doi: 10.1007/s10822-019-00207-x. [DOI] [PubMed] [Google Scholar]

[B12] 12.Ahmad A., Akbar S., Khan S., et al. Deep-AntiFP: prediction of antifungal peptides using distanct multi-informative features incorporating with deep neural networks. Chemometrics and Intelligent Laboratory Systems . 2021;208 doi: 10.1016/j.chemolab.2020.104214.104214 [DOI] [Google Scholar]

[B13] 13.Akbar S., Rahman A. U., Hayat M., Sohail M. CACP: classifying anticancer peptides using discriminative intelligent model via Chou’s 5-step rules and general pseudo components. Chemometrics and Intelligent Laboratory Systems . 2020;196 doi: 10.1016/j.chemolab.2019.103912.103912 [DOI] [Google Scholar]

[B14] 14.Akbar S., Ahmad A., Hayat M., Rehman A. U., Khan S., Ali F. iAtbP-hyb-EnC: prediction of antitubercular peptides via heterogeneous feature representation and genetic algorithm based ensemble learning model. Computers in Biology and Medicine . 2021;137 doi: 10.1016/j.compbiomed.2021.104778.104778 [DOI] [PubMed] [Google Scholar]

[B15] 15.Ahmad A., Akbar S., Tahir M., Hayat M., Ali F. iAFPs-EnC-GA: Identifying Antifungal Peptides Using Sequential and Evolutionary Descriptors Based Multi-Information Fusion and Ensemble Learning Approach. Chemometrics and Intelligent Laboratory Systems . 2022;222 doi: 10.1016/j.chemolab.2022.104516.104516 [DOI] [Google Scholar]

[B16] 16.Akbar S., Hayat M., Tahir M., Khan S., Alarfaj F. K. cACP-DeepGram: classification of anticancer peptides via deep neural network and skip-gram-based word embedding model. Artificial Intelligence in Medicine . 2022;131 doi: 10.1016/j.artmed.2022.102349.102349 [DOI] [PubMed] [Google Scholar]

[B17] 17.Nimrod G., Schushan M., Szilágyi A., Leslie C., Ben-Tal N. iDBPs: a web server for the identification of DNA binding proteins. Bioinformatics . 2010;26(5):692–693. doi: 10.1093/bioinformatics/btq019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] 18.Gao M., Skolnick J. DBD-Hunter: a knowledge-based method for the prediction of DNA–protein interactions. Nucleic Acids Research . 2008;36(12):3978–3992. doi: 10.1093/nar/gkn332. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19.Zhao H., Wang J., Zhou Y., Yang Y. Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome. PLoS One . 2014;9(5) doi: 10.1371/journal.pone.0096694.e96694 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] 20.Ali F., Kumar H., Patil S., Ahmed A., Banjar A., Daud A. DBP-DeepCNN: Prediction of DNA-Binding Proteins Using Wavelet-Based Denoising and Deep Learning. Chemometrics and Intelligent Laboratory Systems . 2022;229 doi: 10.1016/j.chemolab.2022.104639.104639 [DOI] [Google Scholar]

[B21] 21.Kumar K. K., Pugalenthi G., Suganthan P. N. DNA-Prot: identification of DNA binding proteins from protein sequence information using random forest. Journal of Biomolecular Structure and Dynamics . 2009;26(6):679–686. doi: 10.1080/07391102.2009.10507281. [DOI] [PubMed] [Google Scholar]

[B22] 22.Lin W.-Z., Fang J.-A., Xiao X., Chou K.-C. IDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS One . 2011;6(9) doi: 10.1371/journal.pone.0024756.e24756 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23] 23.Liu B., Xu J., Lan X., et al. IDNA-Prot| dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS One . 2014;9 doi: 10.1371/journal.pone.0106691.e106691 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B24] 24.Dong Q., Wang S., Wang K., Liu X., Liu B. Identification of DNA-binding proteins by auto-cross covariance transformation. Proceedings of the Bioinformatics and Biomedicine (BIBM), 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); November 2015; Washington DC, USA. pp. 470–475. [Google Scholar]

[B25] 25.Wei L., Tang J., Zou Q. Local-DPP: an improved DNA-binding protein prediction method by exploring local evolutionary information. Information Sciences . 2017;384:135–144. doi: 10.1016/j.ins.2016.06.026. [DOI] [Google Scholar]

[B26] 26.Ali F., Kabir M., Arif M., et al. DBPPred-PDSD: machine learning approach for prediction of DNA-binding proteins using Discrete Wavelet Transform and optimized integrated features space. Chemometrics and Intelligent Laboratory Systems . 2018;182:21–30. doi: 10.1016/j.chemolab.2018.08.013. [DOI] [Google Scholar]

[B27] 27.Rahman M. S., Shatabda S., Saha S., Kaykobad M., Rahman M. S. DPP-PseAAC: a DNA-binding protein prediction model using Chou’s general PseAAC. Journal of Theoretical Biology . 2018;452:22–34. doi: 10.1016/j.jtbi.2018.05.006. [DOI] [PubMed] [Google Scholar]

[B28] 28.Mishra A., Pokhrel P., Hoque M. T. StackDPPred: a stacking based prediction of DNA-binding protein from sequence. Bioinformatics . 2018;35(3):433–441. doi: 10.1093/bioinformatics/bty653. [DOI] [PubMed] [Google Scholar]

[B29] 29.Li G., Du X., Li X., Zou L., Zhang G., Wu Z. Prediction of DNA binding proteins using local features and long-term dependencies with primary sequences based on deep learning. PeerJ . 2021;9 doi: 10.7717/peerj.11262.e11262 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B30] 30.Zhao Z., Yang W., Zhai Y., Liang Y., Zhao Y. Identify DNA-binding proteins through the extreme gradient boosting algorithm. Frontiers in Genetics . 2021;12 doi: 10.3389/fgene.2021.821996.821996 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B31] 31.Du X., Diao Y., Liu H., Li S. MsDBP: exploring DNA-binding proteins by integrating multi-scale sequence information via chou’s 5-steps rule. J Proteome res . 2019;18 doi: 10.1021/acs.jproteome.9b00226. [DOI] [PubMed] [Google Scholar]

[B32] 32.Ma X., Guo J., Sun X. DNABP: identification of DNA-binding proteins based on feature selection using a random forest and predicting binding residues. PLoS One . 2016;11(12) doi: 10.1371/journal.pone.0167345.e0167345 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B33] 33.Zou C., Gong J., Li H. An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis. BMC Bioinformatics . 2013;14(1):p. 90. doi: 10.1186/1471-2105-14-90. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B34] 34.Altschul S. F., Madden T. L., Schäffer A. A., et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research . 1997;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B35] 35.Zahiri J., Yaghoubi O., Mohammad-Noori M., Ebrahimpour R., Masoudi-Nejad A. PPIevo: protein–protein interaction prediction from PSSM based evolutionary information. Genomics . 2013;102(4):237–242. doi: 10.1016/j.ygeno.2013.05.006. [DOI] [PubMed] [Google Scholar]

[B36] 36.Ali F., Hayat M. Machine learning approaches for discrimination of Extracellular Matrix proteins using hybrid feature space. Journal of Theoretical Biology . 2016;403:30–37. doi: 10.1016/j.jtbi.2016.05.011. [DOI] [PubMed] [Google Scholar]

[B37] 37.Li T., Fan K., Wang J., Wang W. Reduction of protein sequence complexity by residue grouping. Protein Engineering Design and Selection . 2003;16(5):323–330. doi: 10.1093/protein/gzg044. [DOI] [PubMed] [Google Scholar]

[B38] 38.Moshrefi R., Mahjani M. G., Jafarian M. Application of wavelet entropy in analysis of electrochemical noise for corrosion type identification. Electrochemistry Communications . 2014;48:49–51. doi: 10.1016/j.elecom.2014.08.005. [DOI] [Google Scholar]

[B39] 39.Wang X., Wang J., Fu C., Gao Y. Determination of corrosion type by wavelet-based fractal dimension from electrochemical noise. International Journal of Electrochemical Science . 2013;8:7211–7222. [Google Scholar]

[B40] 40.Yu B., Li S., Chen C., et al. Prediction subcellular localization of Gram-negative bacterial proteins by support vector machine using wavelet denoising and Chou’s pseudo amino acid composition. Chemometrics and Intelligent Laboratory Systems . 2017;167:102–112. doi: 10.1016/j.chemolab.2017.05.009. [DOI] [Google Scholar]

[B41] 41.Hayat M., Khan A., Yeasin M. Prediction of membrane proteins using split amino acid and ensemble classification. Amino Acids . 2012;42(6):2447–2460. doi: 10.1007/s00726-011-1053-5. [DOI] [PubMed] [Google Scholar]

[B42] 42.Ma B., Meng F., Yan G., Yan H., Chai B., Song F. Diagnostic classification of cancers using extreme gradient boosting algorithm and multi-omics data. Computers in Biology and Medicine . 2020;121 doi: 10.1016/j.compbiomed.2020.103761.103761 [DOI] [PubMed] [Google Scholar]

[B43] 43.Wang X., Zhang Y., Yu B., et al. Prediction of protein-protein interaction sites through eXtreme gradient boosting with kernel principal component analysis. Computers in Biology and Medicine . 2021;134 doi: 10.1016/j.compbiomed.2021.104516.104516 [DOI] [PubMed] [Google Scholar]

[B44] 44.Akbar S., Khan S., Ali F., Hayat M., Qasim M., Gul S. iHBP-DeepPSSM: identifying hormone binding proteins using PsePSSM based evolutionary features and deep learning approach. Chemometrics and Intelligent Laboratory Systems . 2020;204 doi: 10.1016/j.chemolab.2020.104103.104103 [DOI] [Google Scholar]

[B45] 45.Ali F., Akbar S., Ghulam A., Maher Z. A., Unar A., Talpur D. B. AFP-CMBPred: computational identification of antifreeze proteins by extending consensus sequences into multi-blocks evolutionary information. Computers in Biology and Medicine . 2021;139 doi: 10.1016/j.compbiomed.2021.105006.105006 [DOI] [PubMed] [Google Scholar]

[B46] 46.Khan I. A., Pi D., Khan N., et al. A Privacy-Conserving Framework Based Intrusion Detection Method for Detecting and Recognizing Malicious Behaviours in Cyber-Physical Power Networks. Applied Intelligence . 2021;51:1–16. doi: 10.1007/s10489-021-02222-8. [DOI] [Google Scholar]

[B47] 47.Chaudhari P., Agrawal H., Kotecha K. Data augmentation using MG-GAN for improved cancer classification on gene expression data. Soft Computing . 2020;24(15):11381–11391. doi: 10.1007/s00500-019-04602-2. [DOI] [Google Scholar]

[B48] 48.Ali F., Hayat M. Classification of membrane protein types using voting feature interval in combination with chou׳ s pseudo amino acid composition. Journal of Theoretical Biology . 2015;384:78–83. doi: 10.1016/j.jtbi.2015.07.034. [DOI] [PubMed] [Google Scholar]

[B49] 49.Ali F., Kumar H., Patil S., Ahmad A., Babour A., Daud A. Deep-GHBP: improving prediction of Growth Hormone-binding proteins using deep learning model. Biomedical Signal Processing and Control . 2022;78 doi: 10.1016/j.bspc.2022.103856.103856 [DOI] [Google Scholar]

[B50] 50.Ali F., Kumar H., Patil S., Kotecha K., Banjar A., Daud A. Target-DBPPred: an intelligent model for prediction of DNA-binding proteins using discrete wavelet transform based compression and light eXtreme gradient boosting. Computers in Biology and Medicine . 2022;145 doi: 10.1016/j.compbiomed.2022.105533.105533 [DOI] [PubMed] [Google Scholar]

[B51] 51.Barukab O., Ali F., Alghamdi W., Bassam Y., Khan S. A. DBP-CNN: deep learning-based prediction of DNA-binding proteins by coupling discrete cosine transform with two-dimensional convolutional neural network. Expert Systems with Applications . 2022;197 doi: 10.1016/j.eswa.2022.116729.116729 [DOI] [Google Scholar]

[B52] 52.Barukab O., Ali F., Khan S. A. DBP-GAPred: an intelligent method for prediction of DNA-binding proteins types by enhanced evolutionary profile features with ensemble learning. Journal of Bioinformatics and Computational Biology . 2021;19(04) doi: 10.1142/s0219720021500189.2150018 [DOI] [PubMed] [Google Scholar]

[B53] 53.Ghulam A., Ali F., Sikander R., Ahmad A., Ahmed A., Patil S. ACP-2DCNN: deep learning-based model for improving prediction of anticancer peptides using two-dimensional convolutional neural network. Chemometrics and Intelligent Laboratory Systems . 2022;226 doi: 10.1016/j.chemolab.2022.104589.104589 [DOI] [Google Scholar]

[B54] 54.Ghulam A., Sikander R., Ali F., Swati Z. N. K., Unar A., Talpur D. B. Accurate prediction of immunoglobulin proteins using machine learning model. Informatics in Medicine Unlocked . 2022;29 doi: 10.1016/j.imu.2022.100885.100885 [DOI] [Google Scholar]

[B55] 55.Khan Z. U., Ali F., Ahmad I., Hayat M., Pi D. iPredCNC: computational prediction model for cancerlectins and non-cancerlectins using novel cascade features subset selection. Chemometrics and Intelligent Laboratory Systems . 2019;195 doi: 10.1016/j.chemolab.2019.103876.103876 [DOI] [Google Scholar]

[B56] 56.Khan Z. U., Ali F., Khan I. A., Hussain Y., Pi D. iRSpot-SPI: deep learning-based recombination spots prediction by incorporating secondary sequence information coupled with physio-chemical properties via Chou’s 5-step rule and pseudo components. Chemometrics and Intelligent Laboratory Systems . 2019;189:169–180. doi: 10.1016/j.chemolab.2019.05.003. [DOI] [Google Scholar]

[B57] 57.Khan Z. U., Pi D., Yao S., Nawaz A., Ali F., Ali S. piEnPred: a bi-layered discriminative model for enhancers and their subtypes via novel cascade multi-level subset feature selection algorithm. Frontiers of Computer Science . 2021;15(6):156904–156911. doi: 10.1007/s11704-020-9504-3. [DOI] [Google Scholar]

[B58] 58.Ullah M., Iltaf A., Hou Q., Ali F., Liu C. A foreground extraction approach using convolutional neural network with graph cut. Proceedings of the 2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC); June 2018; Chongqing, China. pp. 40–44. [DOI] [Google Scholar]

[B59] 59.Hu J., Zhou X.-G., Zhu Y.-H., Yu D.-J., Zhang G.-J. TargetDBP: accurate DNA-binding protein prediction via sequence-based multi-view feature learning. IEEE/ACM Transactions on Computational Biology and Bioinformatics . 2020 Jul-Aug;17(4):1419–1429. doi: 10.1109/tcbb.2019.2893634. [DOI] [PubMed] [Google Scholar]

[B60] 60.Du X., Diao Y., Liu H., Li S. MsDBP: exploring DNA-binding proteins by integrating multiscale sequence information via Chou’s five-step rule. Journal of Proteome Research . 2019;18(8):3119–3132. doi: 10.1021/acs.jproteome.9b00226. [DOI] [PubMed] [Google Scholar]

[B61] 61.Sanghani G., Kotecha K. Incremental personalized E-mail spam filter using novel TFDCR feature selection with dynamic feature update. Expert Systems with Applications . 2019;115:287–299. doi: 10.1016/j.eswa.2018.07.049. [DOI] [Google Scholar]

[B62] 62.Ahmad A., Akbar S., Hayat M., Ali F., Sohail M. Identification of Antioxidant Proteins Using a Discriminative Intelligent Model of K-Space Amino Acid Pairs Based Descriptors Incorporating with Ensemble Feature Selection. Biocybernetics and Biomedical Engineering . 2020;42 doi: 10.1016/j.bbe.2020.10.003. [DOI] [Google Scholar]

[B63] 63.Akbar S., Hayat M., Kabir M., Iqbal M. iAFP-gap-SMOTE: an efficient feature extraction scheme gapped dipeptide composition is coupled with an oversampling technique for identification of antifreeze proteins. Letters in Organic Chemistry . 2019;16(4):294–302. doi: 10.2174/1570178615666180816101653. [DOI] [Google Scholar]

[B64] 64.Joshi G., Walambe R., Kotecha K. A review on explainability in multimodal deep neural nets. IEEE Access . 2021;9:59800–59821. doi: 10.1109/access.2021.3070212. [DOI] [Google Scholar]

[B65] 65.Ali F., Arif M., Khan Z. U., Kabir M., Ahmed S., Yu D.-J. SDBP-Pred: prediction of single-stranded and double-stranded DNA-binding proteins by extending consensus sequence and K-segmentation strategies into PSSM. Analytical Biochemistry . 2020;589 doi: 10.1016/j.ab.2019.113494.113494 [DOI] [PubMed] [Google Scholar]

[B66] 66.Ali F., Ali F., Ghulam A., et al. Deep-PCL: a deep learning model for prediction of cancerlectins and non cancerlectins using optimized integrated features. Chemometrics and Intelligent Laboratory Systems . 2022;221 doi: 10.1016/j.chemolab.2021.104484.104484 [DOI] [Google Scholar]

[B67] 67.Sikander R., Ghulam A., Ali F. XGB-DrugPred: computational prediction of druggable proteins using eXtreme gradient boosting and optimized features set. Scientific Reports . 2022;12:5505–5509. doi: 10.1038/s41598-022-09484-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

DBP-iDWT: Improving DNA-Binding Proteins Prediction Using Multi-Perspective Evolutionary Profile and Discrete Wavelet Transform

Farman Ali

Omar Barukab

Ajay B Gadicha

Shruti Patil

Omar Alghushairy

Akram Y Sarhan

Abstract

1. Introduction

Figure 1.

2. Materials and Methods

2.1. Selection of Datasets

2.2. Feature Descriptors

2.2.1. Position-specific Scoring Matrix (PSSM)

2.2.2. Filtered Position-specific Scoring Matrix (F-PSSM)

2.2.3. Position-specific Scoring Matrix-Dipeptide Composition (PSSM-DPC)

2.2.4. Reduced Position Specific Scoring Matrix (R-PSSM)

2.2.5. Discrete Wavelet Transform

Figure 2.

2.3. Light eXtreme Gradient Boosting

Table 1.

2.4. Proposed Model Validation Methodologies

3. Results and Discussion

3.1. Results of Feature Encoders before DWT

Table 2.

3.2. Results of Feature Encoders after DWT

Table 3.

3.3. Comparison with Existing Predictors Using Training Set

Table 4.

3.4. Comparison with Past Predictors Using Independent Set

Table 5.

4. Conclusion and Future Vision

Acknowledgments

Data Availability

Conflicts of Interest

Authors' Contributions

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases