Abstract
Metal ions are significant ligands that bind to proteins and play crucial roles in cell metabolism, material transport, and signal transduction. Predicting the protein‐metal ion ligand binding residues (PMILBRs) accurately is a challenging task in theoretical calculations. In this study, the authors employed fused amino acids and their derived information as feature parameters to predict PMILBRs using three classical machine learning algorithms, yielding favourable prediction results. Subsequently, deep learning algorithm was incorporated in the prediction, resulting in improved results for the sets of Ca2+ and Mg2+ compared to previous studies. The validation matrix provided the optimal prediction model for each ionic ligand binding residue, exhibiting the capability of effectively predicting the binding sites of metal ion ligands for real protein chains.
Keywords: biocomputers, bioinformatics
In this study, the authors employed fused amino acids and their derived information as feature parameters to predict PMILBRs using three classical machine learning algorithms, yielding favourable prediction results. The validation matrix provided the optimal prediction model for each ionic ligand binding residue, exhibiting the capability of effectively predicting the binding sites of metal ion ligands for real protein chains.
1. INTRODUCTION
Proteins can achieve their functions by binding to specific ligands [1, 2]. Metal ions, as significant ligands binding to proteins, play a crucial role in cell metabolism, material transport, and signal transduction [3, 4]. To date, our understanding of this molecular mechanism remains limited. Currently, researchers primarily utilise experimental and theoretical calculations to measure large‐scale protein‐metal ion ligand binding residues (PMILBRs). While the experimental methods can accurately measure these residues, they are also time‐consuming, labour‐intensive, and costly. However, the theoretical calculation method can address the above‐mentioned shortcomings. Because the functions of proteins rely on their structure, predicting PMILBRs based on structural information is more accurate, but it is known that the number of experimentally determined protein structures is limited. Consequently, simple structure‐dependent prediction methods lack generalisation ability. Prediction models based on sequence information can address this limitation. Currently, effectively improving the prediction accuracy of PMILBRs based on sequence information alone is challenging.
For the research field of PMILBRs, researchers based on sequence information in extracting features have done a lot of explorations. Jiang et al. [5] extracted amino acid composition information and site conservation information as feature parameters, where after researchers used discrete incremental algorithm, deviation method, scoring algorithm and position specific scoring matrix et al. [6, 7, 8, 9] to extract amino acid composition information and site conservation information, which further improved the prediction accuracy. Xu et al. [10] introduced amino acid correlation information as feature parameter, and achieved well prediction results. The introduction of amino acid derived information also has a significant impact on the prediction of PMILBRs, derived information includes physicochemical feature and predicted structural information. Jose et al. [11] selected hydrophilicity as feature parameters, Taylor et al. [12] selected charge as feature parameter. Wang et al. [13] selected the energy as feature parameter, the application of the above feature of PMILBRs has increased the prediction performance. Non‐uniform classification of charge and hydrophilicity can cause information loss. Wang et al. [13] and Liu et al. [14] improved the feature extraction method of charge, hydrophilicity by using the information entropy. In terms of predicting structural information, the secondary structure acts as a bridge between the primary and tertiary structures, reflecting the main chain information of the protein. ANGLOR software [15] can provide secondary structure information, relative solvent accessibility, and predicted dihedral angle values. Hu et al. [9] selected the relative solvent accessibility prediction value as feature parameter, Cui et al. [16] used the φ and ψ predicted values as feature parameters, Cao et al. [6] conducted the relative solvent accessibility statistical analysis and reclassification as feature parameter, Liu et al. [14] through statistical analysis of the dihedral angle and the predicted value reclassified at the same time, found that the secondary structure, relative solvent accessibility and dihedral angle as feature parameters can increase prediction effect. Hu et al. [9] based on sequence information and 3D structure information, used ioncom and IonSeq methods prediction of PMILBRs and 5‐fold cross‐validation has obtained well results. However, because the large numbers of protein chains lack experimental 3D structure information in BioLip database, it is a challenge to extract features from 3D structure information. Researchers recently discovered special protein sequence fragments that lack a stable structure and are highly variable. These fragments readily bind with ion ligands and are referred to as disordered regions of proteins [17, 18, 19]. Hao et al. [20] obtained the ‘disorder’ prediction value [21] for each amino acid in the protein sequence using IUPred2 software [22], subsequently using it as a feature parameter to further improve prediction accuracies. Predicted 3D structure was proved as well parameter on the prediction of PMILBRs [23]. During the search for 3D structure information, we found that 10 orthogonal properties clustered from 188 physical and chemical features could describe 3D structure information [23]. You et al. [23] utilised the probability values of these 10 orthogonal factors as feature parameters to predict the PMILBRs. Through the above references, researchers only considered the sequence fragments and did not considered the features of binding residues alone. Previous works in our group indicated that the usage preferences of 20 amino acids were different between binding and unbinding residues. Therefore, we introduced propensity factor of the binding residue as feature parameters.
In academic studies, various algorithm models exhibit varying prediction accuracies for PMILBRs prediction. Liu et al. [14, 24] utilised the k‐nearest neighbour (KNN) and random forest (RF) algorithms to predict PMILBR, yielding successful prediction results. Wang et al. [25] employed the support vector machine (SVM) algorithm to predict 10 types of metal ion ligands, achieving satisfactory prediction results through cross‐validation and independent testing.
Incomplete parameter extraction may result in information loss. In this study, we not only utilised amino acid information but also incorporated derivative information as feature parameters. The derivative information contained physicochemical feature and predicted structural information. Both classical machine learning algorithms and deep learning algorithms were employed in PMILBR prediction. Further, verification matrices of the optimal models corresponding to each metal ion were presented for detailed analysis.
2. MATERIALS AND METHODS
2.1. Data sets
In this study, we focused on investigating 10 kinds of metal ion ligand‐binding residues as the research subject. We filtered the BioLIP database [5, 6, 13, 14] based on sequence similarity <30%, length ≥50 amino acids, and resolution <3 Å. Subsequently, 80% of the protein chains were utilised as training samples, whereas the remaining ones were used as independent testing samples. The binding of a protein with an ionic ligand is not solely determined by the binding residues; it is also influenced by the surrounding residues. Therefore, we adopted the sliding window method to intercept fragments (L). To ensure that each amino acid could appear at the centre of a fragment, we added (L − 1)/2 pseudo‐amino acids to both ends of a protein chain, with the pseudo‐amino acid represented by X, and L representing the window length. If the binding residue was positioned at (L + 1)/2 of the fragment, it was designated as a positive set sample; otherwise, it was categorised as a negative set sample. The constructed dataset is detailed in Table 1.
TABLE 1.
The benchmark data sets of 10 metal ion ligands.
Ligand | Sequence | Training dataset | Testing dataset | ||||||
---|---|---|---|---|---|---|---|---|---|
Chains | P | N | Chains | P | N | Chains | P | N | |
Zn2+ | 1428 | 6408 | 405,113 | 1142 | 5145 | 321,161 | 286 | 1263 | 83,952 |
Cu2+ | 117 | 485 | 33,948 | 93 | 377 | 27,548 | 24 | 108 | 6400 |
Fe2+ | 92 | 382 | 29,345 | 73 | 301 | 23,824 | 19 | 81 | 5521 |
Fe3+ | 217 | 1057 | 68,829 | 173 | 859 | 54,945 | 44 | 198 | 13,884 |
Ca2+ | 1237 | 6789 | 396,957 | 989 | 5256 | 312,876 | 248 | 1533 | 84,081 |
Mg2+ | 1461 | 5212 | 480,307 | 1168 | 4069 | 384,365 | 293 | 1143 | 95,942 |
Mn2+ | 459 | 2124 | 156,625 | 367 | 1685 | 124,543 | 92 | 439 | 32,082 |
Na+ | 78 | 489 | 27,408 | 62 | 408 | 22,411 | 16 | 81 | 4997 |
K+ | 57 | 535 | 18,777 | 45 | 410 | 14,882 | 12 | 125 | 3895 |
Co2+ | 194 | 875 | 55,050 | 155 | 707 | 44,300 | 39 | 168 | 10,750 |
Note: Ligands are type of metal ion ligands; Chains is the number of protein chains; P is the number of samples in the positive sets; and N is the number of samples in the negative sets.
It was found that the number of negative samples was largely more than that of positive ones in Table 1. To deal with the serious imbalance in sample size, we took the number of samples in positive set as a standard and selected the equal number of samples in negative set by the under‐sampling technique. In order to ensure the stability of prediction results, all the negative samples joined in the process of training. The number of negative sample was as many as the number of positive ones, so that the samples in each set were not duplicate. Finally, we took the average value of the evaluation indicators as the final prediction results.
2.2. Selection of features parameters
Feature parameters, including amino acids and their derived information such as physicochemical features and predicted structural information, were selected [14, 23, 24, 25]. Research findings indicated the importance of charge, hydrophilicity, hydrophobicity, and energy in predicting PMILBRs [6, 9, 10, 13]. Thus, we included these physicochemical features as basic parameters in our study. According to the charge hydrolysis after of amino acids, 20 kinds of amino acids can be divided into three categories, positive charged K, R, and H; negative charged D and E; other amino acids do not show electrical properties. According to the hydrophilic and hydrophobic properties of amino acids, 20 kinds of amino acids was divided into four categories [11]. The amino acids R, D, E, N, Q, K and H were strongly hydrophilic, L, I, V, A, M and F were strongly hydrophobic, S, T, Y and W were weakly hydrophilic, and P, G and C belong to one category. During the process of protein binding with ion ligands, the lower the energy, the more stable the structure. We extracted the Laplacian energy values of 20 amino acids [13] which were reclassified in line with statistics. Taken K+ as an example, the energy was statistically analysed in Figure 1. Based on the size of the difference, we divided the values into 4 categories: I (D, G, N, P, S, T); II (A, E, K, L, Q, R); III (C, F, H, I); and IV (M, V, W, Y).
FIGURE 1.
Energy classification of the K+ ligand. The abscissa is 20 amino acids, and the ordinate is the difference of energy probability.
The predicted structural information included secondary structure information, relative solvent accessibility and dihedral angle (φ and ψ), which were obtained by the ANGLOR software [15]. The secondary structure was divided into 3 categories: α‐helix, β‐sheet and coil; relative solvent accessibility was divided into 4 categories: I (0, 0.2], II (0.2, 0.45], III (0.45, 0.6], and IV (0.6, 0.85]; φ was 2 categories [13]: I (−1800, −750] and II (−750, 1800]; and ψ was 3 categories [13]: I (−1800,150], II (150, 1350], and III (1350, 1800].
In the 3D structure information, the disordered value and 10 orthogonal properties were selected as feature parameters, and the disordered values were divided into 2 categories [19]: I (0, 0.5] and II (0.5, 1].
2.3. Extraction of feature parameters
2.3.1. Position weight matrix extracts site conservation information
The position weight matrix is a very effective method for extracting conserved information of sites. This method is widely used in the identification of transcription factor binding sites, functional motif prediction, and other research, and has achieved good results. The matrix values can convey the positional specificity of amino acids, providing a quantitative description of the likelihood of an amino acid appearing in a protein sequence. We selected the state‐of‐the‐art methods [10, 13, 23, 24, 25] to extract features of the site conservation information from amino acids, secondary structure, relative solvent accessibility, energy, disorder, φ angle, ψ angle, amino acid correlation information. The site conservation information was extracted by using position weight matrix, and the matrix elements were expressed as follows:
(1) |
where, , ,P 0,j represents the background probability, and n i,j represents the frequency of the jth amino acid at the ith site, j represents 20 amino acids and X. Two standard scoring matrices can be obtained from the positive and negative training sets in Figure 2 , and 2L dimensional feature vector can be obtained for each segment. Here, when extracting the conservative information of amino acid sites, q is 21, predicted secondary structure (q = 4), relative solvent accessibility (q = 5), energy (q = 5), disorder (q = 3), φ (q = 3), ψ (q = 4).
FIGURE 2.
Positive and negative set standard scoring matrices.
2.3.2. Information from entropy
As the number of amino acids was uneven in the charge and hydrophilicity classification, the entropy was introduced to extract charge and hydrophilicity to prevent information loss. The entropy formula was expressed as follows:
(2) |
where, n j represents the frequency of occurrence of the jth classification in a segment, and N is the segment length. If it represents the charge classification, q = 4; if it represents the hydrophilicity classification, q = 5.
2.3.3. Propensity factor
The propensity factor was first proposed using the Chou–Fasman method [10] for protein secondary structure prediction. It is a method that can well describe the usage preference of individual amino acids, and has achieved good prediction results in the application of secondary structure prediction. In fact, a binding residue itself has a preference on the usage of various amino acids. Here, we extracted binding residues propensity factor as feature parameters. The formula of the propensity factor was as follows [10]:
(3) |
where, , , , , Here, i represents 20 amino acids (i = 1, 2, … 20); j represents binding residues or unbinding residues (j = 1, 2); n ij represents the number of amino acid in binding residues or unbinding residues.
2.4. Algorithm
2.4.1. Support vector machine algorithm
Support vector machine is a classical machine learning method. It is widely used in protein structure and binding residue prediction [5, 6]. The core idea is to map the input vector to a high‐dimensional feature space through a non‐linear transformation. Then, by selecting a series of kernel functions and parameter factors, the optimal hyperplane is obtained, which maximises the distance between it and various samples, achieving the greatest generalisation ability. The discriminant function of the optimal hyperplane is as follows:
(4) |
where, , , is the Lagrange multiplier, is classification threshold, is inner product kernel function,(RBF) this paper chooses radial basis kernel function
(5) |
It has an excellent performance excellently in solving statistical issues on small samples, and non‐linear and high‐dimensional pattern recognition. However, too many feature parameters might cause over‐training problems. In this paper, the selected weka3.8 platform, the SVM algorithm, and c and gamma were set as default.
2.4.2. Random forest algorithm
Random forests (RF) is a classification algorithm proposed by Leo Breiman (2001) [14]. The main idea of the algorithm is the decision tree branch which randomly selects one of all the critical features for branching to grow different decision trees. The RF algorithm generates a random vector to control the growth of each tree in the set. To reduce overfitting, it can obtain the final classification results by voting. This paper also selected weka3.8 platform to implement the RF algorithm. The size of the random feature subset m = M 1/2 (M is the number of feature parameters), the number of decision trees k = 500, and the number of optimised nodes mtry is the default.
2.4.3. K‐nearest neighbour algorithm
K‐nearest neighbour [24] (KNN) is a statistical based machine learning classifier, which was proposed by Cover and Hart in 1967. The basic idea of the KNN classifier is that the k‐nearest samples of a test sample are found using a distance formula, and then the test sample belongs to the category with the largest number in the k‐nearest samples. The KNN classifier has the advantages of being theoretically mature, easy to understand, and being free of training. However, when using the KNN classifier, different k values will yield different classification results, the performance of the KNN classifier is optimal when k takes an appropriate value. Therefore, choosing a suitable k value will get better prediction results. We adopted the KNN classifier on the weka3.8 platform.
2.4.4. Deep neural network algorithm
Deep neural network (DNN) is one of the common deep learning algorithms, which aims to improve the discriminative ability of the model by providing a higher‐level abstraction. Its neural network layers can be divided into input layers, hidden layers and output layers. In most cases, the algorithm hyperparameters need to be optimised, the algorithm presets a set of optimised hyperparameters can significantly improve the training efficiency and prediction accuracy. However, the deep learning algorithm contains many hyperparameters. Optimising such hyperparameters requires a significant amount of computing resource and time. Therefore, referring to previous studies [26, 27], we selected the hidden layer number, hidden layer node number and batch for optimisation. The DNN modules were implemented under the keras framework of Python.
2.5. The evaluation index
For the evaluation of the prediction results, we used the methods commonly in the prediction of PMILBRs: [23, 24, 25, 28, 29] sensitivity (S n ), specificity (S p ), accuracy (Acc), and Matthew's correlation coefficient (MCC). The expressions are as follows:
(6) |
(7) |
(8) |
(9) |
In the formula, the number of metal ion ligand binding residues correctly predicted is TP, otherwise it is FN; the number of metal ion ligand unbinding residues correctly predicted is TN, otherwise it is FP.
3. RESULTS AND DISCUSSION
For the prediction of PMILBRs, the frequently used testing methods were 5‐fold cross‐validation and independent test [10, 13, 23, 24, 25, 30]. Therefore, this paper also uses the above two test methods to evaluate our algorithm model. The prediction flow charts as following Figure 3.
FIGURE 3.
Flow chart for the prediction of PMILBRs. PS, 2L, H, F, V represent component information, site conservation information, information entropy, propensity factors and factor values. aa, ss, sa, φ, ψ, wx, le, dh, qs, 10 factors, respectively, represent amino acids, secondary structure, relative solvent accessibility, dihedral angle, disorder, energy, charge, hydrophilicity and 10 orthogonal factors; RF, SVM, KNN and DNN represent RF algorithm, SVM algorithm, KNN algorithm and DNN. DNN, deep neural network; KNN, k‐nearest neighbour; PMILBRs, protein‐metal ion ligand binding residues; RF, random forest; SVM, support vector machine.
3.1. 5‐Fold cross‐validation prediction results
On training samples, we extracted the components information and site conservation information, information entropy of charge and hydrophilicity, binding residue propensity factors and 10 orthogonal factors as prediction parameters. They were input into RF, SVM, KNN and DNN algorithm, respectively. The prediction results of PMILBRs on 5‐fold cross‐validation were shown in Table 2.
TABLE 2.
5‐Fold cross‐validation prediction results.
Ligand | Modal | S n (%) | S p (%) | Acc (%) | MCC |
---|---|---|---|---|---|
Ca2+ | RF | 81.0 | 81.8 | 81.4 | 0.628 |
Liu's [14] | 94.8 | 85.5 | 90.2 | 0.807 | |
SVM | 78.6 | 77.8 | 78.2 | 0.564 | |
Wang's [25] | 69.5 | 77.8 | 73.6 | 0.474 | |
KNN | 70.6 | 81.9 | 76.2 | 0.528 | |
Liu's [24] | 65.3 | 76.2 | 70.8 | 0.418 | |
DNN | 75.3 | 80.7 | 78.5 | 0.542 | |
Mg2+ | RF | 70.8 | 79.5 | 75.1 | 0.506 |
Liu's [14] | 88.2 | 84.9 | 86.5 | 0.731 | |
SVM | 70.5 | 81.3 | 75.7 | 0.519 | |
Wang's [25] | 70.4 | 75.0 | 72.7 | 0.454 | |
KNN | 70.4 | 74.7 | 72.5 | 0.451 | |
Liu's [24] | 66.7 | 72.6 | 69.7 | 0.394 | |
DNN | 70.9 | 73.1 | 72.0 | 0.441 | |
Zn2+ | RF | 94.1 | 93.8 | 93.9 | 0.879 |
Liu's [14] | 93.0 | 93.2 | 93.1 | 0.862 | |
SVM | 94.9 | 94.0 | 94.0 | 0.880 | |
Wang's [25] | 94.8 | 83.6 | 89.2 | 0.789 | |
KNN | 89.8 | 90.7 | 90.3 | 0.805 | |
Liu's [24] | 94.3 | 83.8 | 89.1 | 0.786 | |
DNN | 91.9 | 89.8 | 90.9 | 0.811 | |
Mn2+ | RF | 84.1 | 88.6 | 86.3 | 0.728 |
Liu's [14] | 84.9 | 89.6 | 87.3 | 0.747 | |
SVM | 84.9 | 87.8 | 86.4 | 0.732 | |
Wang's [25] | 79.8 | 84.9 | 82.3 | 0.648 | |
KNN | 83.2 | 80.3 | 81.8 | 0.636 | |
Liu's [24] | 79.1 | 80.9 | 80.0 | 0.600 | |
DNN | 84.7 | 85.2 | 84.9 | 0.699 | |
Fe2+ | RF | 90.4 | 93.2 | 91.6 | 0.833 |
Liu's [14] | 90.3 | 90.1 | 90.2 | 0.804 | |
SVM | 91.5 | 93.4 | 92.5 | 0.849 | |
Wang's [25] | 91.4 | 90.3 | 90.8 | 0.817 | |
KNN | 92.6 | 80.0 | 86.2 | 0.732 | |
Liu's [24] | 92.1 | 80.4 | 86.3 | 0.730 | |
DNN | 93.7 | 87.4 | 90.5 | 0.814 | |
Fe3+ | RF | 85.8 | 93.0 | 89.4 | 0.790 |
Liu's [14] | 86.4 | 92.3 | 89.4 | 0.789 | |
SVM | 89.7 | 89.8 | 89.4 | 0.795 | |
Wang's [25] | 84.1 | 87.0 | 85.6 | 0.712 | |
KNN | 86.9 | 80.7 | 83.8 | 0.677 | |
Liu's [24] | 84.6 | 84.9 | 84.7 | 0.694 | |
DNN | 85.9 | 89.2 | 87.5 | 0.751 | |
Co2+ | RF | 80.5 | 88.0 | 84.2 | 0.687 |
Liu's [14] | 86.1 | 88.2 | 87.1 | 0.743 | |
SVM | 81.3 | 88.4 | 84.8 | 0.699 | |
Wang's [25] | 76.2 | 84.8 | 80.5 | 0.613 | |
KNN | 78.2 | 75.8 | 77.0 | 0.540 | |
Liu's [24] | 77.6 | 83.1 | 80.3 | 0.608 | |
DNN | 79.2 | 82.3 | 80.7 | 0.617 | |
Cu2+ | RF | 88.7 | 95.6 | 92.1 | 0.845 |
Liu's [14] | 87.8 | 93.4 | 90.6 | 0.814 | |
SVM | 93.8 | 89.8 | 91.8 | 0.837 | |
Wang's [25] | 91.8 | 90.7 | 91.2 | 0.825 | |
KNN | 91.1 | 91.1 | 91.1 | 0.823 | |
Liu's [24] | 92.4 | 86.6 | 89.5 | 0.791 | |
DNN | 95.8 | 75.8 | 85.7 | 0.731 | |
K+ | RF | 91.8 | 93.8 | 94.8 | 0.917 |
Liu's [14] | 89.3 | 71.0 | 80.2 | 0.614 | |
SVM | 80.8 | 92.8 | 90.3 | 0.822 | |
Wang's [25] | 78.1 | 73.6 | 75.9 | 0.518 | |
KNN | 85.0 | 97.0 | 91.0 | 0.826 | |
Liu's [24] | 75.1 | 59.7 | 67.4 | 0.353 | |
DNN | 78.5 | 63.8 | 71.2 | 0.431 | |
Na+ | RF | 86.2 | 85.3 | 85.8 | 0.715 |
Liu's [14] | 88.1 | 76.3 | 82.2 | 0.649 | |
SVM | 83.6 | 85.0 | 89.7 | 0.759 | |
Wang's [25] | 79.6 | 79.6 | 79.6 | 0.591 | |
KNN | 84.2 | 61.3 | 72.4 | 0.468 | |
Liu's [24] | 64.6 | 73.0 | 68.8 | 0.378 | |
DNN | 83.3 | 78.6 | 80.9 | 0.620 |
Overall, the S n , S p , and Acc values of the 10 metal ion ligands under the RF algorithm exceeded 70.8%, 79.5%, and 75.1%, respectively, with MCC values exceeding 0.506. The S n , S p , and Acc values of the 10 metal ion ligands under the SVM algorithm exceeded 70.5%, 77.8%, and 75.7%, respectively, with MCC values exceeding 0.519, of which the S p value of Zn+ reached 94%. The S n , S p , and Acc values of the KNN and DNN algorithms exceeded 70.4%, with MCC values exceeding 0.431.
It can be seen from Table 2 that various ligands have different prediction results on the four algorithms, indicating that each algorithm has its own advantage. Take Mg2+and Fe2+ in Figure 4 as an example, (A) and (B) were the four evaluation indicators of Mg2+ and Fe2+ ligands, respectively. The S n values of Mg2+ in the four classifiers were not significantly different, and the results of S p , Acc and MCC values in RF and SVM were better than those of KNN and DNN. The S n values of Fe2+ in the four algorithms were better in KNN and DNN, the S p , Acc and MCC values in RF, SVM and DNN were better than those in KNN the results. Relatively speaking, the SVM algorithm demonstrated superior predictive performance compared to other algorithms for both types of ion ligands.
FIGURE 4.
5‐Fold cross‐validation results of different algorithms for Mg2+ and Fe2+.
In order to better illustrate that the fused feature parameters were useful for predicting PMILBRs, we compared the prediction results with that of Wang's [25] and Liu's [14, 24] results which were the best prediction results at present (see Table 2). Because Liu and Wang used the RF [14] algorithm, KNN [24] algorithm, and SVM [25] algorithm to predict PMILBRs in the literature. It was found that the prediction results of Fe2+ in RF, SVM and KNN were better than those of Wang's [25] and Liu's [14, 24] results. The S p , Acc and MCC values in RF and SVM were better than the previous prediction results. Comparing the results of RF algorithm with Liu's [14] results, it was found that the prediction results of Zn2+, Fe2+, Fe3+, Cu2+, K+ and Na+ were better than Liu's [14] results, while the prediction results of Mn2+ and Co2+ were slightly worse, and the prediction results of Ca2+ and Mg2+ were not as good as Liu's [14] results. Comparing the results of SVM algorithm with Wang's [25] results, it was found that the prediction results of 10 metal ions were better than Wang's [25] results. Comparing the results of KNN algorithm with Liu's [24] results, it was found that the results of Fe3+ and Co2+ were slightly lower than Liu's [24] results, while the prediction results of the other 8 metal ions were better than Liu's [24] results. The above comparison results fully demonstrate that the results obtained by fusing prediction parameters are better.
3.2. Prediction results of independent test
To test the reliability and practicability of the prediction model, the PMILBRs were tested independently [23, 24, 25]. The prediction results were shown in Table 3.
TABLE 3.
Prediction results of the independent test.
Ligand | Modal | S n (%) | S p (%) | Acc (%) | MCC |
---|---|---|---|---|---|
Ca2+ | RF | 63.7 | 83.5 | 89.3 | 0.161 |
Liu's [14] | 51.1 | 88.7 | 88.1 | 0.163 | |
SVM | 65.8 | 77.9 | 86.0 | 0.150 | |
Wang's [25] | 67.5 | 79.8 | 79.6 | 0.154 | |
KNN | 63.3 | 81.2 | 88.0 | 0.145 | |
DNN | 76.3 | 89.8 | 85.7 | 0.177 | |
Mg2+ | RF | 60.1 | 85.6 | 86.5 | 0.127 |
Liu's [14] | 74.6 | 81.8 | 81.7 | 0.150 | |
SVM | 54.2 | 86.9 | 91.8 | 0.125 | |
Wang's [25] | 72.4 | 80.0 | 79.9 | 0.140 | |
KNN | 70.0 | 68.9 | 80.6 | 0.087 | |
DNN | 66.7 | 88.1 | 87.9 | 0.184 | |
Zn2+ | RF | 94.0 | 97.9 | 98.2 | 0.612 |
Liu's [14] | 92.2 | 90.7 | 90.7 | 0.326 | |
SVM | 93.8 | 96.7 | 97.5 | 0.528 | |
Wang's [25] | 93.0 | 89.8 | 89.9 | 0.315 | |
KNN | 91.9 | 90.6 | 93.9 | 0.328 | |
DNN | 92.1 | 94.7 | 92.1 | 0.367 | |
Mn2+ | RF | 79.7 | 96.4 | 97.2 | 0.420 |
Liu's [14] | 72.9 | 91.9 | 91.7 | 0.262 | |
SVM | 80.9 | 91.5 | 94.4 | 0.287 | |
Wang's [25] | 76.8 | 87.2 | 87.1 | 0.215 | |
KNN | 84.5 | 77.5 | 86.1 | 0.170 | |
DNN | 80.3 | 92.6 | 93.4 | 0.284 | |
Fe2+ | RF | 90.3 | 92.2 | 95.0 | 0.336 |
Liu's [14] | 79.0 | 93.7 | 93.5 | 0.333 | |
SVM | 94.4 | 86.3 | 91.6 | 0.255 | |
Wang's [25] | 87.7 | 85.6 | 85.6 | 0.242 | |
KNN | 86.1 | 66.3 | 78.7 | 0.124 | |
DNN | 91.8 | 95.8 | 92.0 | 0.342 | |
Fe3+ | RF | 73.2 | 96.1 | 96.9 | 0.371 |
Liu's [14] | 72.7 | 94.3 | 94.0 | 0.316 | |
SVM | 84.3 | 88.1 | 92.4 | 0.265 | |
Wang's [25] | 81.3 | 88.0 | 87.9 | 0.243 | |
KNN | 87.7 | 75.1 | 84.5 | 0.176 | |
DNN | 83.8 | 92.2 | 92.1 | 0.327 | |
Co2+ | RF | 71.3 | 97.9 | 97.9 | 0.492 |
Liu's [14] | 75.6 | 87.6 | 87.4 | 0.229 | |
SVM | 75.6 | 90.0 | 93.4 | 0.260 | |
Wang's [25] | 75.6 | 86.2 | 86.1 | 0.215 | |
KNN | 75.0 | 83.5 | 89.6 | 0.191 | |
DNN | 86.4 | 90.2 | 86.9 | 0.326 | |
Cu2+ | RF | 82.0 | 98.0 | 98.2 | 0.534 |
Liu's [14] | 88.0 | 93.9 | 93.8 | 0.399 | |
SVM | 66.7 | 93.3 | 95.0 | 0.287 | |
Wang's [25] | 89.8 | 93.0 | 92.9 | 0.381 | |
KNN | 79.8 | 96.7 | 97.3 | 0.434 | |
DNN | 92.8 | 95.5 | 92.9 | 0.387 | |
K+ | RF | 73.3 | 64.7 | 76.2 | 0.131 |
Liu's [14] | 87.2 | 51.2 | 52.3 | 0.133 | |
SVM | 67.6 | 61.1 | 73.5 | 0.097 | |
Wang's [25] | 73.6 | 70.5 | 70.6 | 0.165 | |
KNN | 84.8 | 54.0 | 68.2 | 0.129 | |
DNN | 63.2 | 89.5 | 65.2 | 0.051 | |
Na+ | RF | 57.7 | 73.4 | 82.9 | 0.094 |
Liu's [14] | 54.3 | 72.8 | 72.5 | 0.076 | |
SVM | 61.9 | 66.7 | 78.4 | 0.081 | |
Wang's [25] | 39.5 | 89.7 | 88.9 | 0.118 | |
KNN | 60.8 | 73.2 | 82.8 | 0.102 | |
DNN | 70.2 | 54.6 | 70.0 | 0.072 |
It can be seen from Table 3 that the prediction results S n , S p , and Acc values of Mg2+, Ca2+, K+ and Na+ ion ligands are all above 54%, and the highest MCC value is 0.184. The S n , S p , and Acc values of Fe2+, Fe3+, Co2+, Cu2+, Mn2+, and Zn2+ ligands are all above 66.0%, and the highest MCC value is 0.612. In particular, the S n , S p and Acc values of Zn2+ are more than 91%, and the MCC value is more than 0.328.
Using Mg2+ and Fe2+ as examples in Figure 5, panels (A) and (B) display the four evaluation indicators for Mg2+ and Fe2+ ligands, respectively. It is evident from Figure 5 that for Mg2+, the RF algorithm achieved the highest S n value, while DNN outperformed RF, SVM, and KNN in terms of S p and MCC. Conversely, SVM exhibited higher Acc values than the other three algorithms. For Fe2+, the S n , S p , Acc, and MCC values obtained using RF, SVM, and DNN were all superior to those of KNN. Relatively speaking, the DNN algorithm demonstrated superior predictive performance for both types of ion ligands.
FIGURE 5.
Independent test results of different algorithms for Mg2+ and Fe2+.
To facilitate comparison, we also put the prediction results of state‐of‐the‐art independent tests of Wang's [25] and Liu's [14] in Table 3. It can be found that the S n , S p , Acc, and MCC values of Zn2+, Mn2+, Fe3+, and Na+ were better than those predicted by Liu's [14] results. The S n , S p , Acc, and MCC values of Zn2+, Mn2+, Fe2+, Fe3+, and Co2+ were better than those predicted by Wang's [25] results. The prediction results of Ca2+ and Mg2+ were the best using the DNN algorithm, and the S n value is more than 66.7%, MCC value was more than 0.177. It showed that deep learning algorithm had more advantages in predicting PMILBRs with a large‐scale amounts data. Thus, fused multiple features as prediction parameters can increase of PMILBRs prediction performance.
In summary, it is evident that different algorithms exhibit varying capabilities in recognising metal ions. Taking Mg2+ and Fe2+ as examples, the SVM algorithm performed best in 5‐fold cross‐validation, whereas the DNN algorithm excelled in independent validation.
In fact, for a given protein chain, the question we are interested in is which ion ligand it can bind to, where are the binding sites and what are the binding residues. Thus, we gave a prediction model under a one‐to‐one strategy to answer the above questions?
To solve this problem, we gave the validation matrix of the trained model and the test model, taking PMILBRs under the RF algorithm as an example (refer to Table 4). In Table 4, when the independent test sets of different ionic ligands were put into the Ca2+ trained model for testing. The Ca2+ ions test set has the best prediction results. It was found that the values on the diagonal of the verification matrix were optimal, which showed that the prediction model has the highest test degree for metal ion ligands itself, and indicates that the prediction model could effectively predict real PMILBRs. It can be also seen from Table 4, that the prediction model under the RF algorithm for Zn2+, Cu2+, Co2+ and Mn2+ performed best. Exceptionally, for example, when the test sets of different ions were applied to the K+ trained model, the prediction accuracy for Fe2+ and Fe3+ surpassed that for K+.
TABLE 4.
Verification matrix of 10 metal ion ligands based on the random forest (RF) algorithm.
test_Ca2+ | test_Mg2+ | test_Zn2+ | test_Mn2+ | test_Fe2+ | test_Fe3+ | test_Co2+ | test_Cu2+ | test_K+ | test_Na+ | |
---|---|---|---|---|---|---|---|---|---|---|
train_Ca2+ | 0.161 | 0.112 | 0.156 | 0.143 | 0.107 | 0.098 | 0.109 | 0.126 | 0.094 | 0.056 |
train_Mg2+ | 0.098 | 0.127 | 0.110 | 0.109 | 0.092 | 0.107 | 0.036 | 0.039 | 0.010 | 0.117 |
train_Zn2+ | 0.234 | 0.321 | 0.612 | 0.150 | 0.168 | 0.197 | 0.173 | 0.194 | 0.170 | 0.135 |
train_Mn2+ | 0.221 | 0.245 | 0.308 | 0.420 | 0.123 | 0.112 | 0.042 | 0.096 | 0.098 | 0.074 |
train_Fe2+ | 0.170 | 0.124 | 0.081 | 0.106 | 0.336 | 0.325 | 0.157 | 0.221 | 0.105 | 0.102 |
train_Fe3+ | 0.113 | 0.116 | 0.156 | 0.158 | 0.356 | 0.371 | 0.111 | 0.159 | 0.123 | 0.121 |
train_Co2+ | 0.090 | 0.056 | 0.101 | 0.083 | 0.162 | 0.161 | 0.492 | 0.131 | 0.070 | 0.103 |
train_Cu2+ | 0.230 | 0.189 | 0.201 | 0.109 | 0.206 | 0.194 | 0.214 | 0.534 | 0.109 | 0.107 |
train_K+ | 0.116 | 0.067 | 0.108 | 0.072 | 0.254 | 0.225 | 0.097 | 0.053 | 0.131 | 0.127 |
train_Na+ | 0.071 | 0.078 | 0.088 | 0.054 | 0.030 | 0.060 | 0.069 | 0.031 | 0.033 | 0.094 |
Validation matrices for each algorithm were provided in the appendix (see Appendix Tables A1, A2, A3). Through the validation matrices of different algorithms, we provided the best prediction model for each ion ligand. The DNN algorithm excels in predicting Ca2+, Mg2+ and Fe2+, while the RF algorithm outperforms in predicting Zn2+, Mn2+, Fe3+, Co2+, Cu2+ and K+. Furthermore, the KNN algorithm yielded the optimal prediction model for Na+.
4. CONCLUSION
Precisely predicting PMILBRs is a critical content for comprehending protein function. In this research, we initially analysed PMILBRs using a combination of single residue and fragment information, which provides a more comprehensive insight due to the preference for specific amino acids and the influence of neighbouring residues in the residue combination. Additionally, we considered the integration of beneficial biological information to avoid the loss of information affecting prediction accuracy. From the biological background of the binding site, structural information and physicochemical characteristics are important factors affecting the binding of ion ligands to proteins. Therefore, based on the primary sequence of the protein, we extracted the corresponding predicted secondary structure information (secondary structure, dihedral angle, and surface accessibility) and tertiary structure information (disorder value and 10 orthogonal factors), physicochemical characteristics (hydrophobicity, charge, and energy), fused amino acids and their derived information (predicted structural information and physicochemical characteristics) as predictive feature parameters. Finally, we selected four prediction algorithms and screened the optimal prediction model through prediction results. Because independent test samples are completely unrelated to training samples, independent testing can fully verify the practicality of the model.
In this article, a comprehensive feature set including amino acids, the physicochemical characteristics of three amino acids, predicted secondary structure information, 10 orthogonal factors, and disordered values was utilised as prediction parameters for various algorithms to predict PMILBRs. Following 5‐fold cross‐validation and independent testing, we obtained an effective prediction method for PMILBRs. In order to solve the problem of which metal ion ligands can be bound to any given protein chain in practical applications. For a given protein chain, we proposed a “one‐to‐one” strategy validation matrix to discover the PMILBRs. The results showed that the diagonal values of the validation matrix are the best, indicating that the prediction model can effectively predict the binding of PMILBR. Therefore, the prediction model obtained by fusing multiple feature parameters can be used as a valuable tool for predicting PMILBR.
AUTHOR CONTRIBUTIONS
Caiyun Yang: Data curation; software; writing—original draft. Xiuzhen Hu: Project administration; writing–review & editing. Zhenxing Feng: Project administration; writing–review & editing. Sixi Hao: Resources; software. Gaimei Zhang: Formal analysis; resources. Shaohua Chen: Data curation; writing–review & editing.Guodong Guo: Data curation; writing—review & editing.
CONFLICT OF INTEREST STATEMENT
The authors declare no conflicts of interest.
PATIENT CONSENT STATEMENT
Not applicable.
PERMISSION TO REPRODUCE MATERIAL FROM OTHER SOURCES
Not applicable.
CLINICAL TRIAL REGISTRATION
Not applicable.
ACKNOWLEDGEMENT
This work was supported by the Natural Science Foundation of China (61961032) and the Natural Science Foundation of Inner Mongolia of China (2024MS06027), the Operation expenses basic scientific research of Inner Mongolia of China (JY20230067).
A
TABLE A1.
Verification matrix of 10 metal ion ligands based on the support vector machine (SVM) algorithm.
test_Ca2+ | test_Mg2+ | test_Zn2+ | test_Mn2+ | test_Fe2+ | test_Fe3+ | test_Co2+ | test_Cu2+ | test_K+ | test_Na+ | |
---|---|---|---|---|---|---|---|---|---|---|
train_Ca2+ | 0.150 | 0.075 | 0.023 | 0.062 | 0.127 | 0.114 | 0.141 | 0.028 | 0.058 | 0.086 |
train_Mg2+ | 0.109 | 0.125 | 0.045 | 0.112 | 0.101 | 0.121 | 0.100 | 0.108 | 0.015 | 0.118 |
train_Zn2+ | 0.234 | 0.320 | 0.528 | 0.140 | 0.246 | 0.276 | 0.183 | 0.208 | 0.145 | 0.127 |
train_Mn2+ | 0.090 | 0.076 | 0.251 | 0.287 | 0.124 | 0.113 | 0.114 | 0.157 | 0.094 | 0.095 |
train_Fe2+ | 0.115 | 0.078 | 0.067 | 0.079 | 0.255 | 0.215 | 0.107 | 0.084 | 0.086 | 0.096 |
train_Fe3+ | 0.135 | 0.080 | 0.137 | 0.100 | 0.214 | 0.265 | 0.161 | 0.152 | 0.101 | 0.118 |
train_Co2+ | 0.062 | 0.037 | 0.123 | 0.051 | 0.042 | 0.080 | 0.260 | 0.095 | 0.114 | 0.085 |
train_Cu2+ | 0.024 | 0.014 | 0.146 | 0.015 | 0.009 | 0.019 | 0.063 | 0.287 | 0.111 | 0.035 |
train_K+ | 0.056 | 0.040 | 0.090 | 0.066 | 0.054 | 0.057 | 0.093 | 0.041 | 0.097 | 0.025 |
train_Na+ | 0.076 | 0.059 | 0.065 | 0.069 | 0.073 | 0.074 | 0.056 | 0.049 | 0.047 | 0.081 |
TABLE A2.
Verification matrix of 10 metal ion ligands based on the k‐nearest neighbour (KNN) algorithm.
test_Ca2+ | test_Mg2+ | test_Zn2+ | test_Mn2+ | test_Fe2+ | test_Fe3+ | test_Co2+ | test_Cu2+ | test_K+ | test_Na+ | |
---|---|---|---|---|---|---|---|---|---|---|
train_Ca2+ | 0.145 | 0.124 | 0.076 | 0.098 | 0.083 | 0.108 | 0.062 | 0.089 | 0.090 | 0.060 |
train_Mg2+ | 0.065 | 0.087 | 0.037 | 0.054 | 0.092 | 0.069 | 0.070 | 0.043 | 0.048 | 0.059 |
train_Zn2+ | 0.197 | 0.128 | 0.328 | 0.210 | 0.056 | 0.029 | 0.035 | 0.041 | 0.018 | 0.010 |
train_Mn2+ | 0.064 | 0.021 | 0.092 | 0.170 | 0.040 | 0.042 | 0.098 | 0.019 | 0.034 | 0.018 |
train_Fe2+ | 0.083 | 0.073 | 0.065 | 0.093 | 0.124 | 0.127 | 0.042 | 0.038 | 0.025 | 0.067 |
train_Fe3+ | 0.099 | 0.131 | 0.152 | 0.107 | 0.127 | 0.176 | 0.024 | 0.018 | 0.014 | 0.035 |
train_Co2+ | 0.077 | 0.098 | 0.106 | 0.012 | 0.090 | 0.156 | 0.191 | 0.127 | 0.106 | 0.089 |
train_Cu2+ | 0.085 | 0.073 | 0.140 | 0.145 | 0.315 | 0.245 | 0.204 | 0.434 | 0.089 | 0.067 |
train_K+ | 0.076 | 0.110 | 0.107 | 0.115 | 0.090 | 0.109 | 0.092 | 0.117 | 0.129 | 0.021 |
train_Na+ | 0.094 | 0.081 | 0.022 | 0.055 | 0.061 | 0.053 | 0.078 | 0.070 | 0.075 | 0.102 |
TABLE A3.
Verification matrix of 10 metal ion ligands based on the deep neural network (DNN) algorithm.
test_Ca2+ | test_Mg2+ | test_Zn2+ | test_Mn2+ | test_Fe2+ | test_Fe3+ | test_Co2+ | test_Cu2+ | test_K+ | test_Na+ | |
---|---|---|---|---|---|---|---|---|---|---|
train_Ca2+ | 0.177 | 0.080 | 0.114 | 0.105 | 0.171 | 0.212 | 0.134 | 0.117 | 0.122 | 0.035 |
train_Mg2+ | 0.129 | 0.184 | 0.174 | 0.124 | 0.260 | 0.251 | 0.208 | 0.269 | 0.097 | 0.098 |
train_Zn2+ | 0.053 | 0.024 | 0.367 | 0.139 | 0.265 | 0.232 | 0.099 | 0.216 | 0.006 | 0.051 |
train_Mn2+ | 0.064 | 0.097 | 0.104 | 0.284 | 0.203 | 0.197 | 0.089 | 0.174 | 0.058 | 0.067 |
train_Fe2+ | 0.102 | 0.089 | 0.145 | 0.189 | 0.342 | 0.267 | 0.105 | 0.201 | 0.089 | 0.092 |
train_Fe3+ | 0.143 | 0.079 | 0.178 | 0.201 | 0.310 | 0.327 | 0.176 | 0.189 | 0.053 | 0.021 |
train_Co2+ | 0.099 | 0.101 | 0.107 | 0.174 | 0.243 | 0.296 | 0.326 | 0.186 | 0.064 | 0.048 |
train_Cu2+ | 0.107 | 0.089 | 0.092 | 0.168 | 0.289 | 0.307 | 0.273 | 0.387 | 0.092 | 0.071 |
train_K+ | 0.010 | 0.045 | 0.017 | 0.032 | 0.047 | 0.028 | 0.015 | 0.009 | 0.051 | 0.044 |
train_Na+ | 0.049 | 0.037 | 0.056 | 0.028 | 0.068 | 0.058 | 0.066 | 0.047 | 0.060 | 0.072 |
Yang, C. , et al.: The optimised model of predicting protein‐metal ion ligand binding residues. IET Syst. Biol. e70001 (2025). 10.1049/syb2.70001
Contributor Information
Xiuzhen Hu, Email: hxz@imut.edu.cn.
Zhenxing Feng, Email: zxfeng@imut.edu.cn.
DATA AVAILABILITY STATEMENT
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
REFERENCES
- 1. Akam, E.A. , et al.: Disulfide‐masked iron prochelators: effects on cell death, proliferation, and hemoglobin production. J. Inorg. Biochem. 180, 186–193 (2018). 10.1016/j.jinorgbio.2017.12.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Brailoiu, E. , et al.: Mechanisms of modulation of brain microvascular endothelial cells function by thrombin. Brain Res. 1657, 167–175 (2016). 10.1016/j.brainres.2016.12.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Reif, D.W. : Ferritin as a source of iron for oxidative damage. Free Radical Biol. Med. 12(5), 417–427 (1992). 10.1016/0891-5849(92)90091-t [DOI] [PubMed] [Google Scholar]
- 4. Reed, G.H. , Poyner, R.R. : Mn2+ as a probe of divalent metal ion binding and function in enzymes and other proteins. Met. Ions Biol. Syst. 37(12), 183–207 (2000) [PubMed] [Google Scholar]
- 5. Jiang, Z. , et al.: Identification of Ca(2+)‐binding residues of a protein from its primary sequence. Genet. Mol. Res. 15(2) (2016). 10.4238/gmr.15027618 [DOI] [PubMed] [Google Scholar]
- 6. Cao, X.Y. , et al.: Identification of metal ion binding sites based on amino acid sequences. PLoS One 12(8), e0183756 (2017). 10.1371/journal.pone.0183756 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Horst, J.A. , Samudrala, R. : A protein sequence meta‐functional signature for calcium binding residue prediction. Pattern Recogn. Lett. 31(14), 2103–2112 (2010). 10.1016/j.patrec.2010.04.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Mazumder, M. , et al.: Prediction and analysis of canonical EF hand loop and qualitative estimation of Ca2+ binding affinity. PLoS One 9(4), e96202 (2014). 10.1371/journal.pone.0096202 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Hu, X.Z. , et al.: Recognizing metal and acid rsdical ion‐binding sites by integrating ab initio modeling with template‐based transferals. Bioinformatics 32(21), 3260–3269 (2016). 10.1093/bioinformatics/btw396 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Xu, S. , et al.: Recognition of metal ion ligand‐binding residues by adding correlation features and propensity factors. Front. Genet. 12, 793800 (2022). 10.3389/fgene.2021.793800 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Josef, P. , Ingvar, E. , Rein, A. : A new method for identification of protein (sub) families in a set of proteins based on hydropathy distribution in proteins. Proteins: Struct., Funct., Bioinf. 58(4), 923–934 (2010) [DOI] [PubMed] [Google Scholar]
- 12. Taylor, W.R. : The classification of amino acid conservation. J. Theor. Biol. 119(2), 205–218 (1986). 10.1016/s0022-5193(86)80075-3 [DOI] [PubMed] [Google Scholar]
- 13. Wang, S. , et al.: Recognizing ion ligand binding sites by SMO algorithm. BMC Mol. Cell Biol. 20((Suppl 3)), 53 (2019). 10.1186/s12860-019-0237-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Liu, L. , et al.: Recognizing ion ligand‐binding residues by random forest algorithm based on optimized dihedral angle. Front. Bioeng. Biotechnol. 8(493) (2020). 10.3389/fbioe.2020.00493 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Wu, S. , Zhang, Y. : ANGLOR: a composite machine‐learning algorithm for protein backbone torsion angle prediction. PLoS One 3(10), e3400 (2008). 10.1371/journal.pone.0003400 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Cui, Y.F. , et al.: Predicting protein‐ligand binding residues with deep convolutional neural networks. BMC Bioinf. 20(1), 93 (2019). 10.1186/s12859-019-2672-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Anfinsen, C.B. , et al.: The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. Proc. Natl. Acad. Sci. Sep. 47(9), 1309–1314 (1961). 10.1073/pnas.47.9.1309 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Dunker, A.K. , et al.: Intrinsic disorder and protein function. Biochemistry 41(21), 6573–6582 (2002). 10.1021/bi012159+ [DOI] [PubMed] [Google Scholar]
- 19. Noivirt‐Brik, O. , Prilusky, J. , Sussman, J.L. : Assessment of disorder predictions in CASP8. Proteins: Struct., Funct., Bioinf. 77((Suppl 9)), 210–216 (2009). 10.1002/prot.22586 [DOI] [PubMed] [Google Scholar]
- 20. Hao, S.X. , et al.: Prediction of metal ion ligand binding residues by adding disorder value and propensity factors based on deep learning algorithm. Front. Genet. 13, 969412 (2022). 10.3389/fgene.2022.969412 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Bálint, M. , Gábor, E. , Zsuzsanna, D. : IUPred2A: context‐dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic Acids Res. 46(W1), W329–W337 (2018). 10.1093/nar/gky384 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Erds, G. , Dosztányi, Z. : Analyzing protein disorder with IUPred2A. Curr. Prot. Bioinf. 70(1) (2020). 10.1002/cpbi.99 [DOI] [PubMed] [Google Scholar]
- 23. You, X.X. , et al.: Recognizing protein‐metal ion ligands binding residues by random forest algorithm with adding orthogonal properties. Comput. Biol. Chem. 98, 107693 (2022). 10.1016/j.compbiolchem.2022.107693 [DOI] [PubMed] [Google Scholar]
- 24. Liu, L. , et al.: Prediction of acid radical ion binding residues by K‐nearest neighbors classifier. BMC Mol. Cell Biol. 20((Suppl 3)), 52–61 (2019). 10.1186/s12860-019-0238-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Wang, S. , et al.: Recognition of ion ligand binding sites based on amino acid features with the fusion of energy, physicochemical and structural features. Curr. Pharmaceut. Des. 27(8), 1093–1102 (2021). 10.2174/1381612826666201029100636 [DOI] [PubMed] [Google Scholar]
- 26. Young, S.R. , et al.: Optimizing deep learning hyper‐parameters through an evolutionary algorithm. In: MLHPC '15 Proceedings of the Workshop on Machine Learning in High‐Performance Computing Environments, vol. 4, pp. 15 (2015) [Google Scholar]
- 27. Koutsoukas, A. , et al.: Deep‐learning: investigating deep neural networks hyper‐parameters and comparison of performance to shallow methods for modeling bioactivity data. J. Cheminf. 9(1), 42 (2017). 10.1186/s13321-017-0226-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Han, H. , et al.: Interpretable machine learning assessment. Neurocomputing 561, 126891 (2023). 10.1016/j.neucom.2023.126891 [DOI] [Google Scholar]
- 29. Zou, X. , et al.: Accurately identifying hemagglutinin using sequence information and machine learning methods. Front. Med. 10, 1281880 (2023). 10.3389/fmed.2023.1281880 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Zulfiqar, H. , et al.: Deep‐STP: a deep learning‐based approach to predict snake toxin proteins by using word embeddings. Front. Med. 10, 1291352 (2024). 10.3389/fmed.2023.1291352 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.