The optimised model of predicting protein‐metal ion ligand binding residues

Caiyun Yang; Xiuzhen Hu; Zhenxing Feng; Sixi Hao; Gaimei Zhang; Shaohua Chen; Guodong Guo

doi:10.1049/syb2.70001

. 2025 Jan 28;19(1):e70001. doi: 10.1049/syb2.70001

The optimised model of predicting protein‐metal ion ligand binding residues

Caiyun Yang ¹, Xiuzhen Hu ^1,^✉, Zhenxing Feng ^1,^✉, Sixi Hao ¹, Gaimei Zhang ², Shaohua Chen ¹, Guodong Guo ³

PMCID: PMC11773433 PMID: 39873344

Abstract

Metal ions are significant ligands that bind to proteins and play crucial roles in cell metabolism, material transport, and signal transduction. Predicting the protein‐metal ion ligand binding residues (PMILBRs) accurately is a challenging task in theoretical calculations. In this study, the authors employed fused amino acids and their derived information as feature parameters to predict PMILBRs using three classical machine learning algorithms, yielding favourable prediction results. Subsequently, deep learning algorithm was incorporated in the prediction, resulting in improved results for the sets of Ca²⁺ and Mg²⁺ compared to previous studies. The validation matrix provided the optimal prediction model for each ionic ligand binding residue, exhibiting the capability of effectively predicting the binding sites of metal ion ligands for real protein chains.

Keywords: biocomputers, bioinformatics

In this study, the authors employed fused amino acids and their derived information as feature parameters to predict PMILBRs using three classical machine learning algorithms, yielding favourable prediction results. The validation matrix provided the optimal prediction model for each ionic ligand binding residue, exhibiting the capability of effectively predicting the binding sites of metal ion ligands for real protein chains.

graphic file with name SYB2-19-e70001-g003.jpg

1. INTRODUCTION

Proteins can achieve their functions by binding to specific ligands [1, 2]. Metal ions, as significant ligands binding to proteins, play a crucial role in cell metabolism, material transport, and signal transduction [3, 4]. To date, our understanding of this molecular mechanism remains limited. Currently, researchers primarily utilise experimental and theoretical calculations to measure large‐scale protein‐metal ion ligand binding residues (PMILBRs). While the experimental methods can accurately measure these residues, they are also time‐consuming, labour‐intensive, and costly. However, the theoretical calculation method can address the above‐mentioned shortcomings. Because the functions of proteins rely on their structure, predicting PMILBRs based on structural information is more accurate, but it is known that the number of experimentally determined protein structures is limited. Consequently, simple structure‐dependent prediction methods lack generalisation ability. Prediction models based on sequence information can address this limitation. Currently, effectively improving the prediction accuracy of PMILBRs based on sequence information alone is challenging.

For the research field of PMILBRs, researchers based on sequence information in extracting features have done a lot of explorations. Jiang et al. [5] extracted amino acid composition information and site conservation information as feature parameters, where after researchers used discrete incremental algorithm, deviation method, scoring algorithm and position specific scoring matrix et al. [6, 7, 8, 9] to extract amino acid composition information and site conservation information, which further improved the prediction accuracy. Xu et al. [10] introduced amino acid correlation information as feature parameter, and achieved well prediction results. The introduction of amino acid derived information also has a significant impact on the prediction of PMILBRs, derived information includes physicochemical feature and predicted structural information. Jose et al. [11] selected hydrophilicity as feature parameters, Taylor et al. [12] selected charge as feature parameter. Wang et al. [13] selected the energy as feature parameter, the application of the above feature of PMILBRs has increased the prediction performance. Non‐uniform classification of charge and hydrophilicity can cause information loss. Wang et al. [13] and Liu et al. [14] improved the feature extraction method of charge, hydrophilicity by using the information entropy. In terms of predicting structural information, the secondary structure acts as a bridge between the primary and tertiary structures, reflecting the main chain information of the protein. ANGLOR software [15] can provide secondary structure information, relative solvent accessibility, and predicted dihedral angle values. Hu et al. [9] selected the relative solvent accessibility prediction value as feature parameter, Cui et al. [16] used the φ and ψ predicted values as feature parameters, Cao et al. [6] conducted the relative solvent accessibility statistical analysis and reclassification as feature parameter, Liu et al. [14] through statistical analysis of the dihedral angle and the predicted value reclassified at the same time, found that the secondary structure, relative solvent accessibility and dihedral angle as feature parameters can increase prediction effect. Hu et al. [9] based on sequence information and 3D structure information, used ioncom and IonSeq methods prediction of PMILBRs and 5‐fold cross‐validation has obtained well results. However, because the large numbers of protein chains lack experimental 3D structure information in BioLip database, it is a challenge to extract features from 3D structure information. Researchers recently discovered special protein sequence fragments that lack a stable structure and are highly variable. These fragments readily bind with ion ligands and are referred to as disordered regions of proteins [17, 18, 19]. Hao et al. [20] obtained the ‘disorder’ prediction value [21] for each amino acid in the protein sequence using IUPred2 software [22], subsequently using it as a feature parameter to further improve prediction accuracies. Predicted 3D structure was proved as well parameter on the prediction of PMILBRs [23]. During the search for 3D structure information, we found that 10 orthogonal properties clustered from 188 physical and chemical features could describe 3D structure information [23]. You et al. [23] utilised the probability values of these 10 orthogonal factors as feature parameters to predict the PMILBRs. Through the above references, researchers only considered the sequence fragments and did not considered the features of binding residues alone. Previous works in our group indicated that the usage preferences of 20 amino acids were different between binding and unbinding residues. Therefore, we introduced propensity factor of the binding residue as feature parameters.

In academic studies, various algorithm models exhibit varying prediction accuracies for PMILBRs prediction. Liu et al. [14, 24] utilised the k‐nearest neighbour (KNN) and random forest (RF) algorithms to predict PMILBR, yielding successful prediction results. Wang et al. [25] employed the support vector machine (SVM) algorithm to predict 10 types of metal ion ligands, achieving satisfactory prediction results through cross‐validation and independent testing.

Incomplete parameter extraction may result in information loss. In this study, we not only utilised amino acid information but also incorporated derivative information as feature parameters. The derivative information contained physicochemical feature and predicted structural information. Both classical machine learning algorithms and deep learning algorithms were employed in PMILBR prediction. Further, verification matrices of the optimal models corresponding to each metal ion were presented for detailed analysis.

2. MATERIALS AND METHODS

2.1. Data sets

In this study, we focused on investigating 10 kinds of metal ion ligand‐binding residues as the research subject. We filtered the BioLIP database [5, 6, 13, 14] based on sequence similarity <30%, length ≥50 amino acids, and resolution <3 Å. Subsequently, 80% of the protein chains were utilised as training samples, whereas the remaining ones were used as independent testing samples. The binding of a protein with an ionic ligand is not solely determined by the binding residues; it is also influenced by the surrounding residues. Therefore, we adopted the sliding window method to intercept fragments (L). To ensure that each amino acid could appear at the centre of a fragment, we added (L − 1)/2 pseudo‐amino acids to both ends of a protein chain, with the pseudo‐amino acid represented by X, and L representing the window length. If the binding residue was positioned at (L + 1)/2 of the fragment, it was designated as a positive set sample; otherwise, it was categorised as a negative set sample. The constructed dataset is detailed in Table 1.

TABLE 1.

The benchmark data sets of 10 metal ion ligands.

Ligand	Sequence			Training dataset			Testing dataset
Ligand	Chains	P	N	Chains	P	N	Chains	P	N
Zn²⁺	1428	6408	405,113	1142	5145	321,161	286	1263	83,952
Cu²⁺	117	485	33,948	93	377	27,548	24	108	6400
Fe²⁺	92	382	29,345	73	301	23,824	19	81	5521
Fe³⁺	217	1057	68,829	173	859	54,945	44	198	13,884
Ca²⁺	1237	6789	396,957	989	5256	312,876	248	1533	84,081
Mg²⁺	1461	5212	480,307	1168	4069	384,365	293	1143	95,942
Mn²⁺	459	2124	156,625	367	1685	124,543	92	439	32,082
Na⁺	78	489	27,408	62	408	22,411	16	81	4997
K⁺	57	535	18,777	45	410	14,882	12	125	3895
Co²⁺	194	875	55,050	155	707	44,300	39	168	10,750

Open in a new tab

Note: Ligands are type of metal ion ligands; Chains is the number of protein chains; P is the number of samples in the positive sets; and N is the number of samples in the negative sets.

It was found that the number of negative samples was largely more than that of positive ones in Table 1. To deal with the serious imbalance in sample size, we took the number of samples in positive set as a standard and selected the equal number of samples in negative set by the under‐sampling technique. In order to ensure the stability of prediction results, all the negative samples joined in the process of training. The number of negative sample was as many as the number of positive ones, so that the samples in each set were not duplicate. Finally, we took the average value of the evaluation indicators as the final prediction results.

2.2. Selection of features parameters

Feature parameters, including amino acids and their derived information such as physicochemical features and predicted structural information, were selected [14, 23, 24, 25]. Research findings indicated the importance of charge, hydrophilicity, hydrophobicity, and energy in predicting PMILBRs [6, 9, 10, 13]. Thus, we included these physicochemical features as basic parameters in our study. According to the charge hydrolysis after of amino acids, 20 kinds of amino acids can be divided into three categories, positive charged K, R, and H; negative charged D and E; other amino acids do not show electrical properties. According to the hydrophilic and hydrophobic properties of amino acids, 20 kinds of amino acids was divided into four categories [11]. The amino acids R, D, E, N, Q, K and H were strongly hydrophilic, L, I, V, A, M and F were strongly hydrophobic, S, T, Y and W were weakly hydrophilic, and P, G and C belong to one category. During the process of protein binding with ion ligands, the lower the energy, the more stable the structure. We extracted the Laplacian energy values of 20 amino acids [13] which were reclassified in line with statistics. Taken K⁺ as an example, the energy was statistically analysed in Figure 1. Based on the size of the difference, we divided the values into 4 categories: I (D, G, N, P, S, T); II (A, E, K, L, Q, R); III (C, F, H, I); and IV (M, V, W, Y).

Energy classification of the K⁺ ligand. The abscissa is 20 amino acids, and the ordinate is the difference of energy probability.

The predicted structural information included secondary structure information, relative solvent accessibility and dihedral angle (φ and ψ), which were obtained by the ANGLOR software [15]. The secondary structure was divided into 3 categories: α‐helix, β‐sheet and coil; relative solvent accessibility was divided into 4 categories: I (0, 0.2], II (0.2, 0.45], III (0.45, 0.6], and IV (0.6, 0.85]; φ was 2 categories [13]: I (−180⁰, −75⁰] and II (−75⁰, 180⁰]; and ψ was 3 categories [13]: I (−180⁰,15⁰], II (15⁰, 135⁰], and III (135⁰, 180⁰].

In the 3D structure information, the disordered value and 10 orthogonal properties were selected as feature parameters, and the disordered values were divided into 2 categories [19]: I (0, 0.5] and II (0.5, 1].

2.3. Extraction of feature parameters

2.3.1. Position weight matrix extracts site conservation information

The position weight matrix is a very effective method for extracting conserved information of sites. This method is widely used in the identification of transcription factor binding sites, functional motif prediction, and other research, and has achieved good results. The matrix values can convey the positional specificity of amino acids, providing a quantitative description of the likelihood of an amino acid appearing in a protein sequence. We selected the state‐of‐the‐art methods [10, 13, 23, 24, 25] to extract features of the site conservation information from amino acids, secondary structure, relative solvent accessibility, energy, disorder, φ angle, ψ angle, amino acid correlation information. The site conservation information was extracted by using position weight matrix, and the matrix elements were expressed as follows:

m_{i, j} = \ln (\frac{p_{i, j}}{p_{0, j}})

(1)

where, $p_{i, j} = \frac{(n_{i, j} + \frac{\sqrt{N_{i}}}{q})}{(N_{i} + \sqrt{N_{i}})}$ , $N_{i} = \sum_{j = 1}^{21} n_{i, j}$ ,P _0,j represents the background probability, and n _i,j represents the frequency of the jth amino acid at the ith site, j represents 20 amino acids and X. Two standard scoring matrices can be obtained from the positive and negative training sets in Figure 2 , and 2L dimensional feature vector can be obtained for each segment. Here, when extracting the conservative information of amino acid sites, q is 21, predicted secondary structure (q = 4), relative solvent accessibility (q = 5), energy (q = 5), disorder (q = 3), φ (q = 3), ψ (q = 4).

Positive and negative set standard scoring matrices.

2.3.2. Information from entropy

As the number of amino acids was uneven in the charge and hydrophilicity classification, the entropy was introduced to extract charge and hydrophilicity to prevent information loss. The entropy formula was expressed as follows:

H (x) = - \sum_{j = 1}^{q} p_{j} \log_{2} p_{j}

(2)

where, $p_{j} = \frac{n_{j}}{N}$ n _j represents the frequency of occurrence of the jth classification in a segment, and N is the segment length. If it represents the charge classification, q = 4; if it represents the hydrophilicity classification, q = 5.

2.3.3. Propensity factor

The propensity factor was first proposed using the Chou–Fasman method [10] for protein secondary structure prediction. It is a method that can well describe the usage preference of individual amino acids, and has achieved good prediction results in the application of secondary structure prediction. In fact, a binding residue itself has a preference on the usage of various amino acids. Here, we extracted binding residues propensity factor as feature parameters. The formula of the propensity factor was as follows [10]:

F_{i j} = \frac{p_{i j}}{p_{j}}

(3)

where, $\overset{i, j}{p} = \frac{\overset{i . j}{n}}{\overset{i}{N}}$ , $\overset{j}{p} = \frac{\overset{j}{N}}{\overset{t}{N}}$ , $N_{i} = \sum_{j = 1}^{2} n_{i j}$ , $N_{j} = \sum_{i = 1}^{20} n_{i j}$ , $N_{t} = \sum_{j = 1}^{2} N_{j}$ Here, i represents 20 amino acids (i = 1, 2, … 20); j represents binding residues or unbinding residues (j = 1, 2); n _ij represents the number of amino acid in binding residues or unbinding residues.

2.4. Algorithm

2.4.1. Support vector machine algorithm

Support vector machine is a classical machine learning method. It is widely used in protein structure and binding residue prediction [5, 6]. The core idea is to map the input vector to a high‐dimensional feature space through a non‐linear transformation. Then, by selecting a series of kernel functions and parameter factors, the optimal hyperplane is obtained, which maximises the distance between it and various samples, achieving the greatest generalisation ability. The discriminant function of the optimal hyperplane is as follows:

f (x) = sgn [\sum_{i = 1}^{k} α_{i}^{*} y_{i} k (x, x_{i}) + b^{*}]

(4)

where, $x \in R^{n}$ , $y \in {+ 1, - 1}$ , $α_{i}^{*}$ is the Lagrange multiplier, $b^{*}$ is classification threshold, $k (x, x_{i})$ is inner product kernel function,(RBF) this paper chooses radial basis kernel function

k (x, x_{i}) = \exp (- g {‖ x - y ‖}^{2})

(5)

It has an excellent performance excellently in solving statistical issues on small samples, and non‐linear and high‐dimensional pattern recognition. However, too many feature parameters might cause over‐training problems. In this paper, the selected weka3.8 platform, the SVM algorithm, and c and gamma were set as default.

2.4.2. Random forest algorithm

Random forests (RF) is a classification algorithm proposed by Leo Breiman (2001) [14]. The main idea of the algorithm is the decision tree branch which randomly selects one of all the critical features for branching to grow different decision trees. The RF algorithm generates a random vector to control the growth of each tree in the set. To reduce overfitting, it can obtain the final classification results by voting. This paper also selected weka3.8 platform to implement the RF algorithm. The size of the random feature subset m = M ^1/2 (M is the number of feature parameters), the number of decision trees k = 500, and the number of optimised nodes mtry is the default.

2.4.3. K‐nearest neighbour algorithm

K‐nearest neighbour [24] (KNN) is a statistical based machine learning classifier, which was proposed by Cover and Hart in 1967. The basic idea of the KNN classifier is that the k‐nearest samples of a test sample are found using a distance formula, and then the test sample belongs to the category with the largest number in the k‐nearest samples. The KNN classifier has the advantages of being theoretically mature, easy to understand, and being free of training. However, when using the KNN classifier, different k values will yield different classification results, the performance of the KNN classifier is optimal when k takes an appropriate value. Therefore, choosing a suitable k value will get better prediction results. We adopted the KNN classifier on the weka3.8 platform.

2.4.4. Deep neural network algorithm

Deep neural network (DNN) is one of the common deep learning algorithms, which aims to improve the discriminative ability of the model by providing a higher‐level abstraction. Its neural network layers can be divided into input layers, hidden layers and output layers. In most cases, the algorithm hyperparameters need to be optimised, the algorithm presets a set of optimised hyperparameters can significantly improve the training efficiency and prediction accuracy. However, the deep learning algorithm contains many hyperparameters. Optimising such hyperparameters requires a significant amount of computing resource and time. Therefore, referring to previous studies [26, 27], we selected the hidden layer number, hidden layer node number and batch for optimisation. The DNN modules were implemented under the keras framework of Python.

2.5. The evaluation index

For the evaluation of the prediction results, we used the methods commonly in the prediction of PMILBRs: [23, 24, 25, 28, 29] sensitivity (S _n), specificity (S _p), accuracy (Acc), and Matthew's correlation coefficient (MCC). The expressions are as follows:

S_{n} = \frac{TP}{TP + FN} \times 100 %

(6)

S_{p} = \frac{TN}{TN + FP} \times 100 %

(7)

Acc = \frac{TP + TN}{TP + TN + FP + FN} \times 100 %

(8)

MCC = \frac{(TP \times TN) - (FP \times FN)}{\sqrt{(TP \times FP) (TP \times FN) (TN \times FP) (TN \times FN)}}

(9)

In the formula, the number of metal ion ligand binding residues correctly predicted is TP, otherwise it is FN; the number of metal ion ligand unbinding residues correctly predicted is TN, otherwise it is FP.

3. RESULTS AND DISCUSSION

For the prediction of PMILBRs, the frequently used testing methods were 5‐fold cross‐validation and independent test [10, 13, 23, 24, 25, 30]. Therefore, this paper also uses the above two test methods to evaluate our algorithm model. The prediction flow charts as following Figure 3.

Flow chart for the prediction of PMILBRs. PS, 2L, H, F, V represent component information, site conservation information, information entropy, propensity factors and factor values. aa, ss, sa, φ, ψ, wx, le, dh, qs, 10 factors, respectively, represent amino acids, secondary structure, relative solvent accessibility, dihedral angle, disorder, energy, charge, hydrophilicity and 10 orthogonal factors; RF, SVM, KNN and DNN represent RF algorithm, SVM algorithm, KNN algorithm and DNN. DNN, deep neural network; KNN, k‐nearest neighbour; PMILBRs, protein‐metal ion ligand binding residues; RF, random forest; SVM, support vector machine.

3.1. 5‐Fold cross‐validation prediction results

On training samples, we extracted the components information and site conservation information, information entropy of charge and hydrophilicity, binding residue propensity factors and 10 orthogonal factors as prediction parameters. They were input into RF, SVM, KNN and DNN algorithm, respectively. The prediction results of PMILBRs on 5‐fold cross‐validation were shown in Table 2.

TABLE 2.

5‐Fold cross‐validation prediction results.

Ligand	Modal	S _n (%)	S _p (%)	Acc (%)	MCC
Ca²⁺	RF	81.0	81.8	81.4	0.628
	Liu's [14]	94.8	85.5	90.2	0.807
	SVM	78.6	77.8	78.2	0.564
	Wang's [25]	69.5	77.8	73.6	0.474
	KNN	70.6	81.9	76.2	0.528
	Liu's [24]	65.3	76.2	70.8	0.418
	DNN	75.3	80.7	78.5	0.542
Mg²⁺	RF	70.8	79.5	75.1	0.506
	Liu's [14]	88.2	84.9	86.5	0.731
	SVM	70.5	81.3	75.7	0.519
	Wang's [25]	70.4	75.0	72.7	0.454
	KNN	70.4	74.7	72.5	0.451
	Liu's [24]	66.7	72.6	69.7	0.394
	DNN	70.9	73.1	72.0	0.441
Zn²⁺	RF	94.1	93.8	93.9	0.879
	Liu's [14]	93.0	93.2	93.1	0.862
	SVM	94.9	94.0	94.0	0.880
	Wang's [25]	94.8	83.6	89.2	0.789
	KNN	89.8	90.7	90.3	0.805
	Liu's [24]	94.3	83.8	89.1	0.786
	DNN	91.9	89.8	90.9	0.811
Mn²⁺	RF	84.1	88.6	86.3	0.728
	Liu's [14]	84.9	89.6	87.3	0.747
	SVM	84.9	87.8	86.4	0.732
	Wang's [25]	79.8	84.9	82.3	0.648
	KNN	83.2	80.3	81.8	0.636
	Liu's [24]	79.1	80.9	80.0	0.600
	DNN	84.7	85.2	84.9	0.699
Fe²⁺	RF	90.4	93.2	91.6	0.833
	Liu's [14]	90.3	90.1	90.2	0.804
	SVM	91.5	93.4	92.5	0.849
	Wang's [25]	91.4	90.3	90.8	0.817
	KNN	92.6	80.0	86.2	0.732
	Liu's [24]	92.1	80.4	86.3	0.730
	DNN	93.7	87.4	90.5	0.814
Fe³⁺	RF	85.8	93.0	89.4	0.790
	Liu's [14]	86.4	92.3	89.4	0.789
	SVM	89.7	89.8	89.4	0.795
	Wang's [25]	84.1	87.0	85.6	0.712
	KNN	86.9	80.7	83.8	0.677
	Liu's [24]	84.6	84.9	84.7	0.694
	DNN	85.9	89.2	87.5	0.751
Co²⁺	RF	80.5	88.0	84.2	0.687
	Liu's [14]	86.1	88.2	87.1	0.743
	SVM	81.3	88.4	84.8	0.699
	Wang's [25]	76.2	84.8	80.5	0.613
	KNN	78.2	75.8	77.0	0.540
	Liu's [24]	77.6	83.1	80.3	0.608
	DNN	79.2	82.3	80.7	0.617
Cu²⁺	RF	88.7	95.6	92.1	0.845
	Liu's [14]	87.8	93.4	90.6	0.814
	SVM	93.8	89.8	91.8	0.837
	Wang's [25]	91.8	90.7	91.2	0.825
	KNN	91.1	91.1	91.1	0.823
	Liu's [24]	92.4	86.6	89.5	0.791
	DNN	95.8	75.8	85.7	0.731
K⁺	RF	91.8	93.8	94.8	0.917
	Liu's [14]	89.3	71.0	80.2	0.614
	SVM	80.8	92.8	90.3	0.822
	Wang's [25]	78.1	73.6	75.9	0.518
	KNN	85.0	97.0	91.0	0.826
	Liu's [24]	75.1	59.7	67.4	0.353
	DNN	78.5	63.8	71.2	0.431
Na⁺	RF	86.2	85.3	85.8	0.715
	Liu's [14]	88.1	76.3	82.2	0.649
	SVM	83.6	85.0	89.7	0.759
	Wang's [25]	79.6	79.6	79.6	0.591
	KNN	84.2	61.3	72.4	0.468
	Liu's [24]	64.6	73.0	68.8	0.378
	DNN	83.3	78.6	80.9	0.620

Open in a new tab

Overall, the S _n, S _p, and Acc values of the 10 metal ion ligands under the RF algorithm exceeded 70.8%, 79.5%, and 75.1%, respectively, with MCC values exceeding 0.506. The S _n, S _p, and Acc values of the 10 metal ion ligands under the SVM algorithm exceeded 70.5%, 77.8%, and 75.7%, respectively, with MCC values exceeding 0.519, of which the S _p value of Zn⁺ reached 94%. The S _n, S _p, and Acc values of the KNN and DNN algorithms exceeded 70.4%, with MCC values exceeding 0.431.

It can be seen from Table 2 that various ligands have different prediction results on the four algorithms, indicating that each algorithm has its own advantage. Take Mg²⁺and Fe²⁺ in Figure 4 as an example, (A) and (B) were the four evaluation indicators of Mg²⁺ and Fe²⁺ ligands, respectively. The S _n values of Mg²⁺ in the four classifiers were not significantly different, and the results of S _p, Acc and MCC values in RF and SVM were better than those of KNN and DNN. The S _n values of Fe²⁺ in the four algorithms were better in KNN and DNN, the S _p, Acc and MCC values in RF, SVM and DNN were better than those in KNN the results. Relatively speaking, the SVM algorithm demonstrated superior predictive performance compared to other algorithms for both types of ion ligands.

5‐Fold cross‐validation results of different algorithms for Mg²⁺ and Fe²⁺.

In order to better illustrate that the fused feature parameters were useful for predicting PMILBRs, we compared the prediction results with that of Wang's [25] and Liu's [14, 24] results which were the best prediction results at present (see Table 2). Because Liu and Wang used the RF [14] algorithm, KNN [24] algorithm, and SVM [25] algorithm to predict PMILBRs in the literature. It was found that the prediction results of Fe²⁺ in RF, SVM and KNN were better than those of Wang's [25] and Liu's [14, 24] results. The S _p, Acc and MCC values in RF and SVM were better than the previous prediction results. Comparing the results of RF algorithm with Liu's [14] results, it was found that the prediction results of Zn²⁺, Fe²⁺, Fe³⁺, Cu²⁺, K⁺ and Na⁺ were better than Liu's [14] results, while the prediction results of Mn²⁺ and Co²⁺ were slightly worse, and the prediction results of Ca²⁺ and Mg²⁺ were not as good as Liu's [14] results. Comparing the results of SVM algorithm with Wang's [25] results, it was found that the prediction results of 10 metal ions were better than Wang's [25] results. Comparing the results of KNN algorithm with Liu's [24] results, it was found that the results of Fe³⁺ and Co²⁺ were slightly lower than Liu's [24] results, while the prediction results of the other 8 metal ions were better than Liu's [24] results. The above comparison results fully demonstrate that the results obtained by fusing prediction parameters are better.

3.2. Prediction results of independent test

To test the reliability and practicability of the prediction model, the PMILBRs were tested independently [23, 24, 25]. The prediction results were shown in Table 3.

TABLE 3.

Prediction results of the independent test.

Ligand	Modal	S _n (%)	S _p (%)	Acc (%)	MCC
Ca²⁺	RF	63.7	83.5	89.3	0.161
	Liu's [14]	51.1	88.7	88.1	0.163
	SVM	65.8	77.9	86.0	0.150
	Wang's [25]	67.5	79.8	79.6	0.154
	KNN	63.3	81.2	88.0	0.145
	DNN	76.3	89.8	85.7	0.177
Mg²⁺	RF	60.1	85.6	86.5	0.127
	Liu's [14]	74.6	81.8	81.7	0.150
	SVM	54.2	86.9	91.8	0.125
	Wang's [25]	72.4	80.0	79.9	0.140
	KNN	70.0	68.9	80.6	0.087
	DNN	66.7	88.1	87.9	0.184
Zn²⁺	RF	94.0	97.9	98.2	0.612
	Liu's [14]	92.2	90.7	90.7	0.326
	SVM	93.8	96.7	97.5	0.528
	Wang's [25]	93.0	89.8	89.9	0.315
	KNN	91.9	90.6	93.9	0.328
	DNN	92.1	94.7	92.1	0.367
Mn²⁺	RF	79.7	96.4	97.2	0.420
	Liu's [14]	72.9	91.9	91.7	0.262
	SVM	80.9	91.5	94.4	0.287
	Wang's [25]	76.8	87.2	87.1	0.215
	KNN	84.5	77.5	86.1	0.170
	DNN	80.3	92.6	93.4	0.284
Fe²⁺	RF	90.3	92.2	95.0	0.336
	Liu's [14]	79.0	93.7	93.5	0.333
	SVM	94.4	86.3	91.6	0.255
	Wang's [25]	87.7	85.6	85.6	0.242
	KNN	86.1	66.3	78.7	0.124
	DNN	91.8	95.8	92.0	0.342
Fe³⁺	RF	73.2	96.1	96.9	0.371
	Liu's [14]	72.7	94.3	94.0	0.316
	SVM	84.3	88.1	92.4	0.265
	Wang's [25]	81.3	88.0	87.9	0.243
	KNN	87.7	75.1	84.5	0.176
	DNN	83.8	92.2	92.1	0.327
Co²⁺	RF	71.3	97.9	97.9	0.492
	Liu's [14]	75.6	87.6	87.4	0.229
	SVM	75.6	90.0	93.4	0.260
	Wang's [25]	75.6	86.2	86.1	0.215
	KNN	75.0	83.5	89.6	0.191
	DNN	86.4	90.2	86.9	0.326
Cu²⁺	RF	82.0	98.0	98.2	0.534
	Liu's [14]	88.0	93.9	93.8	0.399
	SVM	66.7	93.3	95.0	0.287
	Wang's [25]	89.8	93.0	92.9	0.381
	KNN	79.8	96.7	97.3	0.434
	DNN	92.8	95.5	92.9	0.387
K⁺	RF	73.3	64.7	76.2	0.131
	Liu's [14]	87.2	51.2	52.3	0.133
	SVM	67.6	61.1	73.5	0.097
	Wang's [25]	73.6	70.5	70.6	0.165
	KNN	84.8	54.0	68.2	0.129
	DNN	63.2	89.5	65.2	0.051
Na⁺	RF	57.7	73.4	82.9	0.094
	Liu's [14]	54.3	72.8	72.5	0.076
	SVM	61.9	66.7	78.4	0.081
	Wang's [25]	39.5	89.7	88.9	0.118
	KNN	60.8	73.2	82.8	0.102
	DNN	70.2	54.6	70.0	0.072

Open in a new tab

It can be seen from Table 3 that the prediction results S _n, S _p, and Acc values of Mg²⁺, Ca²⁺, K⁺ and Na⁺ ion ligands are all above 54%, and the highest MCC value is 0.184. The S _n, S _p, and Acc values of Fe²⁺, Fe³⁺, Co²⁺, Cu²⁺, Mn²⁺, and Zn²⁺ ligands are all above 66.0%, and the highest MCC value is 0.612. In particular, the S _n, S _p and Acc values of Zn²⁺ are more than 91%, and the MCC value is more than 0.328.

Using Mg²⁺ and Fe²⁺ as examples in Figure 5, panels (A) and (B) display the four evaluation indicators for Mg²⁺ and Fe²⁺ ligands, respectively. It is evident from Figure 5 that for Mg²⁺, the RF algorithm achieved the highest S _n value, while DNN outperformed RF, SVM, and KNN in terms of S _p and MCC. Conversely, SVM exhibited higher Acc values than the other three algorithms. For Fe²⁺, the S _n, S _p, Acc, and MCC values obtained using RF, SVM, and DNN were all superior to those of KNN. Relatively speaking, the DNN algorithm demonstrated superior predictive performance for both types of ion ligands.

Independent test results of different algorithms for Mg²⁺ and Fe²⁺.

To facilitate comparison, we also put the prediction results of state‐of‐the‐art independent tests of Wang's [25] and Liu's [14] in Table 3. It can be found that the S _n, S _p, Acc, and MCC values of Zn²⁺, Mn²⁺, Fe³⁺, and Na⁺ were better than those predicted by Liu's [14] results. The S _n, S _p, Acc, and MCC values of Zn²⁺, Mn²⁺, Fe²⁺, Fe³⁺, and Co²⁺ were better than those predicted by Wang's [25] results. The prediction results of Ca²⁺ and Mg²⁺ were the best using the DNN algorithm, and the S _n value is more than 66.7%, MCC value was more than 0.177. It showed that deep learning algorithm had more advantages in predicting PMILBRs with a large‐scale amounts data. Thus, fused multiple features as prediction parameters can increase of PMILBRs prediction performance.

In summary, it is evident that different algorithms exhibit varying capabilities in recognising metal ions. Taking Mg²⁺ and Fe²⁺ as examples, the SVM algorithm performed best in 5‐fold cross‐validation, whereas the DNN algorithm excelled in independent validation.

In fact, for a given protein chain, the question we are interested in is which ion ligand it can bind to, where are the binding sites and what are the binding residues. Thus, we gave a prediction model under a one‐to‐one strategy to answer the above questions?

To solve this problem, we gave the validation matrix of the trained model and the test model, taking PMILBRs under the RF algorithm as an example (refer to Table 4). In Table 4, when the independent test sets of different ionic ligands were put into the Ca²⁺ trained model for testing. The Ca²⁺ ions test set has the best prediction results. It was found that the values on the diagonal of the verification matrix were optimal, which showed that the prediction model has the highest test degree for metal ion ligands itself, and indicates that the prediction model could effectively predict real PMILBRs. It can be also seen from Table 4, that the prediction model under the RF algorithm for Zn²⁺, Cu²⁺, Co²⁺ and Mn²⁺ performed best. Exceptionally, for example, when the test sets of different ions were applied to the K⁺ trained model, the prediction accuracy for Fe²⁺ and Fe³⁺ surpassed that for K⁺.

TABLE 4.

Verification matrix of 10 metal ion ligands based on the random forest (RF) algorithm.

	test_Ca²⁺	test_Mg²⁺	test_Zn²⁺	test_Mn²⁺	test_Fe²⁺	test_Fe³⁺	test_Co²⁺	test_Cu²⁺	test_K⁺	test_Na⁺
train_Ca²⁺	0.161	0.112	0.156	0.143	0.107	0.098	0.109	0.126	0.094	0.056
train_Mg²⁺	0.098	0.127	0.110	0.109	0.092	0.107	0.036	0.039	0.010	0.117
train_Zn²⁺	0.234	0.321	0.612	0.150	0.168	0.197	0.173	0.194	0.170	0.135
train_Mn²⁺	0.221	0.245	0.308	0.420	0.123	0.112	0.042	0.096	0.098	0.074
train_Fe²⁺	0.170	0.124	0.081	0.106	0.336	0.325	0.157	0.221	0.105	0.102
train_Fe³⁺	0.113	0.116	0.156	0.158	0.356	0.371	0.111	0.159	0.123	0.121
train_Co²⁺	0.090	0.056	0.101	0.083	0.162	0.161	0.492	0.131	0.070	0.103
train_Cu²⁺	0.230	0.189	0.201	0.109	0.206	0.194	0.214	0.534	0.109	0.107
train_K⁺	0.116	0.067	0.108	0.072	0.254	0.225	0.097	0.053	0.131	0.127
train_Na⁺	0.071	0.078	0.088	0.054	0.030	0.060	0.069	0.031	0.033	0.094

Open in a new tab

Validation matrices for each algorithm were provided in the appendix (see Appendix Tables A1, A2, A3). Through the validation matrices of different algorithms, we provided the best prediction model for each ion ligand. The DNN algorithm excels in predicting Ca²⁺, Mg²⁺ and Fe²⁺, while the RF algorithm outperforms in predicting Zn²⁺, Mn²⁺, Fe³⁺, Co²⁺, Cu²⁺ and K⁺. Furthermore, the KNN algorithm yielded the optimal prediction model for Na⁺.

4. CONCLUSION

Precisely predicting PMILBRs is a critical content for comprehending protein function. In this research, we initially analysed PMILBRs using a combination of single residue and fragment information, which provides a more comprehensive insight due to the preference for specific amino acids and the influence of neighbouring residues in the residue combination. Additionally, we considered the integration of beneficial biological information to avoid the loss of information affecting prediction accuracy. From the biological background of the binding site, structural information and physicochemical characteristics are important factors affecting the binding of ion ligands to proteins. Therefore, based on the primary sequence of the protein, we extracted the corresponding predicted secondary structure information (secondary structure, dihedral angle, and surface accessibility) and tertiary structure information (disorder value and 10 orthogonal factors), physicochemical characteristics (hydrophobicity, charge, and energy), fused amino acids and their derived information (predicted structural information and physicochemical characteristics) as predictive feature parameters. Finally, we selected four prediction algorithms and screened the optimal prediction model through prediction results. Because independent test samples are completely unrelated to training samples, independent testing can fully verify the practicality of the model.

In this article, a comprehensive feature set including amino acids, the physicochemical characteristics of three amino acids, predicted secondary structure information, 10 orthogonal factors, and disordered values was utilised as prediction parameters for various algorithms to predict PMILBRs. Following 5‐fold cross‐validation and independent testing, we obtained an effective prediction method for PMILBRs. In order to solve the problem of which metal ion ligands can be bound to any given protein chain in practical applications. For a given protein chain, we proposed a “one‐to‐one” strategy validation matrix to discover the PMILBRs. The results showed that the diagonal values of the validation matrix are the best, indicating that the prediction model can effectively predict the binding of PMILBR. Therefore, the prediction model obtained by fusing multiple feature parameters can be used as a valuable tool for predicting PMILBR.

AUTHOR CONTRIBUTIONS

Caiyun Yang: Data curation; software; writing—original draft. Xiuzhen Hu: Project administration; writing–review & editing. Zhenxing Feng: Project administration; writing–review & editing. Sixi Hao: Resources; software. Gaimei Zhang: Formal analysis; resources. Shaohua Chen: Data curation; writing–review & editing.Guodong Guo: Data curation; writing—review & editing.

CONFLICT OF INTEREST STATEMENT

The authors declare no conflicts of interest.

PATIENT CONSENT STATEMENT

Not applicable.

PERMISSION TO REPRODUCE MATERIAL FROM OTHER SOURCES

Not applicable.

CLINICAL TRIAL REGISTRATION

Not applicable.

ACKNOWLEDGEMENT

This work was supported by the Natural Science Foundation of China (61961032) and the Natural Science Foundation of Inner Mongolia of China (2024MS06027), the Operation expenses basic scientific research of Inner Mongolia of China (JY20230067).

A

TABLE A1.

Verification matrix of 10 metal ion ligands based on the support vector machine (SVM) algorithm.

	test_Ca²⁺	test_Mg²⁺	test_Zn²⁺	test_Mn²⁺	test_Fe²⁺	test_Fe³⁺	test_Co²⁺	test_Cu²⁺	test_K⁺	test_Na⁺
train_Ca²⁺	0.150	0.075	0.023	0.062	0.127	0.114	0.141	0.028	0.058	0.086
train_Mg²⁺	0.109	0.125	0.045	0.112	0.101	0.121	0.100	0.108	0.015	0.118
train_Zn²⁺	0.234	0.320	0.528	0.140	0.246	0.276	0.183	0.208	0.145	0.127
train_Mn²⁺	0.090	0.076	0.251	0.287	0.124	0.113	0.114	0.157	0.094	0.095
train_Fe²⁺	0.115	0.078	0.067	0.079	0.255	0.215	0.107	0.084	0.086	0.096
train_Fe³⁺	0.135	0.080	0.137	0.100	0.214	0.265	0.161	0.152	0.101	0.118
train_Co²⁺	0.062	0.037	0.123	0.051	0.042	0.080	0.260	0.095	0.114	0.085
train_Cu²+	0.024	0.014	0.146	0.015	0.009	0.019	0.063	0.287	0.111	0.035
train_K⁺	0.056	0.040	0.090	0.066	0.054	0.057	0.093	0.041	0.097	0.025
train_Na⁺	0.076	0.059	0.065	0.069	0.073	0.074	0.056	0.049	0.047	0.081

Open in a new tab

TABLE A2.

Verification matrix of 10 metal ion ligands based on the k‐nearest neighbour (KNN) algorithm.

	test_Ca²⁺	test_Mg²⁺	test_Zn²⁺	test_Mn²⁺	test_Fe²⁺	test_Fe³⁺	test_Co²⁺	test_Cu²⁺	test_K⁺	test_Na⁺
train_Ca²⁺	0.145	0.124	0.076	0.098	0.083	0.108	0.062	0.089	0.090	0.060
train_Mg²⁺	0.065	0.087	0.037	0.054	0.092	0.069	0.070	0.043	0.048	0.059
train_Zn²⁺	0.197	0.128	0.328	0.210	0.056	0.029	0.035	0.041	0.018	0.010
train_Mn²⁺	0.064	0.021	0.092	0.170	0.040	0.042	0.098	0.019	0.034	0.018
train_Fe²⁺	0.083	0.073	0.065	0.093	0.124	0.127	0.042	0.038	0.025	0.067
train_Fe³⁺	0.099	0.131	0.152	0.107	0.127	0.176	0.024	0.018	0.014	0.035
train_Co²⁺	0.077	0.098	0.106	0.012	0.090	0.156	0.191	0.127	0.106	0.089
train_Cu²⁺	0.085	0.073	0.140	0.145	0.315	0.245	0.204	0.434	0.089	0.067
train_K⁺	0.076	0.110	0.107	0.115	0.090	0.109	0.092	0.117	0.129	0.021
train_Na⁺	0.094	0.081	0.022	0.055	0.061	0.053	0.078	0.070	0.075	0.102

Open in a new tab

TABLE A3.

Verification matrix of 10 metal ion ligands based on the deep neural network (DNN) algorithm.

	test_Ca²⁺	test_Mg²⁺	test_Zn²⁺	test_Mn²⁺	test_Fe²⁺	test_Fe³⁺	test_Co²⁺	test_Cu²⁺	test_K⁺	test_Na⁺
train_Ca²⁺	0.177	0.080	0.114	0.105	0.171	0.212	0.134	0.117	0.122	0.035
train_Mg²⁺	0.129	0.184	0.174	0.124	0.260	0.251	0.208	0.269	0.097	0.098
train_Zn²⁺	0.053	0.024	0.367	0.139	0.265	0.232	0.099	0.216	0.006	0.051
train_Mn²⁺	0.064	0.097	0.104	0.284	0.203	0.197	0.089	0.174	0.058	0.067
train_Fe²⁺	0.102	0.089	0.145	0.189	0.342	0.267	0.105	0.201	0.089	0.092
train_Fe³⁺	0.143	0.079	0.178	0.201	0.310	0.327	0.176	0.189	0.053	0.021
train_Co²⁺	0.099	0.101	0.107	0.174	0.243	0.296	0.326	0.186	0.064	0.048
train_Cu²⁺	0.107	0.089	0.092	0.168	0.289	0.307	0.273	0.387	0.092	0.071
train_K⁺	0.010	0.045	0.017	0.032	0.047	0.028	0.015	0.009	0.051	0.044
train_Na⁺	0.049	0.037	0.056	0.028	0.068	0.058	0.066	0.047	0.060	0.072

Open in a new tab

Yang, C. , et al.: The optimised model of predicting protein‐metal ion ligand binding residues. IET Syst. Biol. e70001 (2025). 10.1049/syb2.70001

Contributor Information

Xiuzhen Hu, Email: hxz@imut.edu.cn.

Zhenxing Feng, Email: zxfeng@imut.edu.cn.

DATA AVAILABILITY STATEMENT

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

REFERENCES

1. Akam, E.A. , et al.: Disulfide‐masked iron prochelators: effects on cell death, proliferation, and hemoglobin production. J. Inorg. Biochem. 180, 186–193 (2018). 10.1016/j.jinorgbio.2017.12.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Brailoiu, E. , et al.: Mechanisms of modulation of brain microvascular endothelial cells function by thrombin. Brain Res. 1657, 167–175 (2016). 10.1016/j.brainres.2016.12.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Reif, D.W. : Ferritin as a source of iron for oxidative damage. Free Radical Biol. Med. 12(5), 417–427 (1992). 10.1016/0891-5849(92)90091-t [DOI] [PubMed] [Google Scholar]
4. Reed, G.H. , Poyner, R.R. : Mn²⁺ as a probe of divalent metal ion binding and function in enzymes and other proteins. Met. Ions Biol. Syst. 37(12), 183–207 (2000) [PubMed] [Google Scholar]
5. Jiang, Z. , et al.: Identification of Ca(2⁺)‐binding residues of a protein from its primary sequence. Genet. Mol. Res. 15(2) (2016). 10.4238/gmr.15027618 [DOI] [PubMed] [Google Scholar]
6. Cao, X.Y. , et al.: Identification of metal ion binding sites based on amino acid sequences. PLoS One 12(8), e0183756 (2017). 10.1371/journal.pone.0183756 [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Horst, J.A. , Samudrala, R. : A protein sequence meta‐functional signature for calcium binding residue prediction. Pattern Recogn. Lett. 31(14), 2103–2112 (2010). 10.1016/j.patrec.2010.04.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Mazumder, M. , et al.: Prediction and analysis of canonical EF hand loop and qualitative estimation of Ca²⁺ binding affinity. PLoS One 9(4), e96202 (2014). 10.1371/journal.pone.0096202 [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Hu, X.Z. , et al.: Recognizing metal and acid rsdical ion‐binding sites by integrating ab initio modeling with template‐based transferals. Bioinformatics 32(21), 3260–3269 (2016). 10.1093/bioinformatics/btw396 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Xu, S. , et al.: Recognition of metal ion ligand‐binding residues by adding correlation features and propensity factors. Front. Genet. 12, 793800 (2022). 10.3389/fgene.2021.793800 [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Josef, P. , Ingvar, E. , Rein, A. : A new method for identification of protein (sub) families in a set of proteins based on hydropathy distribution in proteins. Proteins: Struct., Funct., Bioinf. 58(4), 923–934 (2010) [DOI] [PubMed] [Google Scholar]
12. Taylor, W.R. : The classification of amino acid conservation. J. Theor. Biol. 119(2), 205–218 (1986). 10.1016/s0022-5193(86)80075-3 [DOI] [PubMed] [Google Scholar]
13. Wang, S. , et al.: Recognizing ion ligand binding sites by SMO algorithm. BMC Mol. Cell Biol. 20((Suppl 3)), 53 (2019). 10.1186/s12860-019-0237-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Liu, L. , et al.: Recognizing ion ligand‐binding residues by random forest algorithm based on optimized dihedral angle. Front. Bioeng. Biotechnol. 8(493) (2020). 10.3389/fbioe.2020.00493 [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Wu, S. , Zhang, Y. : ANGLOR: a composite machine‐learning algorithm for protein backbone torsion angle prediction. PLoS One 3(10), e3400 (2008). 10.1371/journal.pone.0003400 [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Cui, Y.F. , et al.: Predicting protein‐ligand binding residues with deep convolutional neural networks. BMC Bioinf. 20(1), 93 (2019). 10.1186/s12859-019-2672-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Anfinsen, C.B. , et al.: The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. Proc. Natl. Acad. Sci. Sep. 47(9), 1309–1314 (1961). 10.1073/pnas.47.9.1309 [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Dunker, A.K. , et al.: Intrinsic disorder and protein function. Biochemistry 41(21), 6573–6582 (2002). 10.1021/bi012159+ [DOI] [PubMed] [Google Scholar]
19. Noivirt‐Brik, O. , Prilusky, J. , Sussman, J.L. : Assessment of disorder predictions in CASP8. Proteins: Struct., Funct., Bioinf. 77((Suppl 9)), 210–216 (2009). 10.1002/prot.22586 [DOI] [PubMed] [Google Scholar]
20. Hao, S.X. , et al.: Prediction of metal ion ligand binding residues by adding disorder value and propensity factors based on deep learning algorithm. Front. Genet. 13, 969412 (2022). 10.3389/fgene.2022.969412 [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Bálint, M. , Gábor, E. , Zsuzsanna, D. : IUPred2A: context‐dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic Acids Res. 46(W1), W329–W337 (2018). 10.1093/nar/gky384 [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Erds, G. , Dosztányi, Z. : Analyzing protein disorder with IUPred2A. Curr. Prot. Bioinf. 70(1) (2020). 10.1002/cpbi.99 [DOI] [PubMed] [Google Scholar]
23. You, X.X. , et al.: Recognizing protein‐metal ion ligands binding residues by random forest algorithm with adding orthogonal properties. Comput. Biol. Chem. 98, 107693 (2022). 10.1016/j.compbiolchem.2022.107693 [DOI] [PubMed] [Google Scholar]
24. Liu, L. , et al.: Prediction of acid radical ion binding residues by K‐nearest neighbors classifier. BMC Mol. Cell Biol. 20((Suppl 3)), 52–61 (2019). 10.1186/s12860-019-0238-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Wang, S. , et al.: Recognition of ion ligand binding sites based on amino acid features with the fusion of energy, physicochemical and structural features. Curr. Pharmaceut. Des. 27(8), 1093–1102 (2021). 10.2174/1381612826666201029100636 [DOI] [PubMed] [Google Scholar]
26. Young, S.R. , et al.: Optimizing deep learning hyper‐parameters through an evolutionary algorithm. In: MLHPC '15 Proceedings of the Workshop on Machine Learning in High‐Performance Computing Environments, vol. 4, pp. 15 (2015) [Google Scholar]
27. Koutsoukas, A. , et al.: Deep‐learning: investigating deep neural networks hyper‐parameters and comparison of performance to shallow methods for modeling bioactivity data. J. Cheminf. 9(1), 42 (2017). 10.1186/s13321-017-0226-y [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Han, H. , et al.: Interpretable machine learning assessment. Neurocomputing 561, 126891 (2023). 10.1016/j.neucom.2023.126891 [DOI] [Google Scholar]
29. Zou, X. , et al.: Accurately identifying hemagglutinin using sequence information and machine learning methods. Front. Med. 10, 1281880 (2023). 10.3389/fmed.2023.1281880 [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Zulfiqar, H. , et al.: Deep‐STP: a deep learning‐based approach to predict snake toxin proteins by using word embeddings. Front. Med. 10, 1291352 (2024). 10.3389/fmed.2023.1291352 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

[syb270001-bib-0001] 1. Akam, E.A. , et al.: Disulfide‐masked iron prochelators: effects on cell death, proliferation, and hemoglobin production. J. Inorg. Biochem. 180, 186–193 (2018). 10.1016/j.jinorgbio.2017.12.016 [DOI] [PMC free article] [PubMed] [Google Scholar]

[syb270001-bib-0002] 2. Brailoiu, E. , et al.: Mechanisms of modulation of brain microvascular endothelial cells function by thrombin. Brain Res. 1657, 167–175 (2016). 10.1016/j.brainres.2016.12.011 [DOI] [PMC free article] [PubMed] [Google Scholar]

[syb270001-bib-0003] 3. Reif, D.W. : Ferritin as a source of iron for oxidative damage. Free Radical Biol. Med. 12(5), 417–427 (1992). 10.1016/0891-5849(92)90091-t [DOI] [PubMed] [Google Scholar]

[syb270001-bib-0004] 4. Reed, G.H. , Poyner, R.R. : Mn²⁺ as a probe of divalent metal ion binding and function in enzymes and other proteins. Met. Ions Biol. Syst. 37(12), 183–207 (2000) [PubMed] [Google Scholar]

[syb270001-bib-0005] 5. Jiang, Z. , et al.: Identification of Ca(2⁺)‐binding residues of a protein from its primary sequence. Genet. Mol. Res. 15(2) (2016). 10.4238/gmr.15027618 [DOI] [PubMed] [Google Scholar]

[syb270001-bib-0006] 6. Cao, X.Y. , et al.: Identification of metal ion binding sites based on amino acid sequences. PLoS One 12(8), e0183756 (2017). 10.1371/journal.pone.0183756 [DOI] [PMC free article] [PubMed] [Google Scholar]

[syb270001-bib-0007] 7. Horst, J.A. , Samudrala, R. : A protein sequence meta‐functional signature for calcium binding residue prediction. Pattern Recogn. Lett. 31(14), 2103–2112 (2010). 10.1016/j.patrec.2010.04.012 [DOI] [PMC free article] [PubMed] [Google Scholar]

[syb270001-bib-0008] 8. Mazumder, M. , et al.: Prediction and analysis of canonical EF hand loop and qualitative estimation of Ca²⁺ binding affinity. PLoS One 9(4), e96202 (2014). 10.1371/journal.pone.0096202 [DOI] [PMC free article] [PubMed] [Google Scholar]

[syb270001-bib-0009] 9. Hu, X.Z. , et al.: Recognizing metal and acid rsdical ion‐binding sites by integrating ab initio modeling with template‐based transferals. Bioinformatics 32(21), 3260–3269 (2016). 10.1093/bioinformatics/btw396 [DOI] [PMC free article] [PubMed] [Google Scholar]

[syb270001-bib-0010] 10. Xu, S. , et al.: Recognition of metal ion ligand‐binding residues by adding correlation features and propensity factors. Front. Genet. 12, 793800 (2022). 10.3389/fgene.2021.793800 [DOI] [PMC free article] [PubMed] [Google Scholar]

[syb270001-bib-0011] 11. Josef, P. , Ingvar, E. , Rein, A. : A new method for identification of protein (sub) families in a set of proteins based on hydropathy distribution in proteins. Proteins: Struct., Funct., Bioinf. 58(4), 923–934 (2010) [DOI] [PubMed] [Google Scholar]

[syb270001-bib-0012] 12. Taylor, W.R. : The classification of amino acid conservation. J. Theor. Biol. 119(2), 205–218 (1986). 10.1016/s0022-5193(86)80075-3 [DOI] [PubMed] [Google Scholar]

[syb270001-bib-0013] 13. Wang, S. , et al.: Recognizing ion ligand binding sites by SMO algorithm. BMC Mol. Cell Biol. 20((Suppl 3)), 53 (2019). 10.1186/s12860-019-0237-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[syb270001-bib-0014] 14. Liu, L. , et al.: Recognizing ion ligand‐binding residues by random forest algorithm based on optimized dihedral angle. Front. Bioeng. Biotechnol. 8(493) (2020). 10.3389/fbioe.2020.00493 [DOI] [PMC free article] [PubMed] [Google Scholar]

[syb270001-bib-0015] 15. Wu, S. , Zhang, Y. : ANGLOR: a composite machine‐learning algorithm for protein backbone torsion angle prediction. PLoS One 3(10), e3400 (2008). 10.1371/journal.pone.0003400 [DOI] [PMC free article] [PubMed] [Google Scholar]

[syb270001-bib-0016] 16. Cui, Y.F. , et al.: Predicting protein‐ligand binding residues with deep convolutional neural networks. BMC Bioinf. 20(1), 93 (2019). 10.1186/s12859-019-2672-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[syb270001-bib-0017] 17. Anfinsen, C.B. , et al.: The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. Proc. Natl. Acad. Sci. Sep. 47(9), 1309–1314 (1961). 10.1073/pnas.47.9.1309 [DOI] [PMC free article] [PubMed] [Google Scholar]

[syb270001-bib-0018] 18. Dunker, A.K. , et al.: Intrinsic disorder and protein function. Biochemistry 41(21), 6573–6582 (2002). 10.1021/bi012159+ [DOI] [PubMed] [Google Scholar]

[syb270001-bib-0019] 19. Noivirt‐Brik, O. , Prilusky, J. , Sussman, J.L. : Assessment of disorder predictions in CASP8. Proteins: Struct., Funct., Bioinf. 77((Suppl 9)), 210–216 (2009). 10.1002/prot.22586 [DOI] [PubMed] [Google Scholar]

[syb270001-bib-0020] 20. Hao, S.X. , et al.: Prediction of metal ion ligand binding residues by adding disorder value and propensity factors based on deep learning algorithm. Front. Genet. 13, 969412 (2022). 10.3389/fgene.2022.969412 [DOI] [PMC free article] [PubMed] [Google Scholar]

[syb270001-bib-0021] 21. Bálint, M. , Gábor, E. , Zsuzsanna, D. : IUPred2A: context‐dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic Acids Res. 46(W1), W329–W337 (2018). 10.1093/nar/gky384 [DOI] [PMC free article] [PubMed] [Google Scholar]

[syb270001-bib-0022] 22. Erds, G. , Dosztányi, Z. : Analyzing protein disorder with IUPred2A. Curr. Prot. Bioinf. 70(1) (2020). 10.1002/cpbi.99 [DOI] [PubMed] [Google Scholar]

[syb270001-bib-0023] 23. You, X.X. , et al.: Recognizing protein‐metal ion ligands binding residues by random forest algorithm with adding orthogonal properties. Comput. Biol. Chem. 98, 107693 (2022). 10.1016/j.compbiolchem.2022.107693 [DOI] [PubMed] [Google Scholar]

[syb270001-bib-0024] 24. Liu, L. , et al.: Prediction of acid radical ion binding residues by K‐nearest neighbors classifier. BMC Mol. Cell Biol. 20((Suppl 3)), 52–61 (2019). 10.1186/s12860-019-0238-8 [DOI] [PMC free article] [PubMed] [Google Scholar]

[syb270001-bib-0025] 25. Wang, S. , et al.: Recognition of ion ligand binding sites based on amino acid features with the fusion of energy, physicochemical and structural features. Curr. Pharmaceut. Des. 27(8), 1093–1102 (2021). 10.2174/1381612826666201029100636 [DOI] [PubMed] [Google Scholar]

[syb270001-bib-0026] 26. Young, S.R. , et al.: Optimizing deep learning hyper‐parameters through an evolutionary algorithm. In: MLHPC '15 Proceedings of the Workshop on Machine Learning in High‐Performance Computing Environments, vol. 4, pp. 15 (2015) [Google Scholar]

[syb270001-bib-0027] 27. Koutsoukas, A. , et al.: Deep‐learning: investigating deep neural networks hyper‐parameters and comparison of performance to shallow methods for modeling bioactivity data. J. Cheminf. 9(1), 42 (2017). 10.1186/s13321-017-0226-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[syb270001-bib-0028] 28. Han, H. , et al.: Interpretable machine learning assessment. Neurocomputing 561, 126891 (2023). 10.1016/j.neucom.2023.126891 [DOI] [Google Scholar]

[syb270001-bib-0029] 29. Zou, X. , et al.: Accurately identifying hemagglutinin using sequence information and machine learning methods. Front. Med. 10, 1281880 (2023). 10.3389/fmed.2023.1281880 [DOI] [PMC free article] [PubMed] [Google Scholar]

[syb270001-bib-0030] 30. Zulfiqar, H. , et al.: Deep‐STP: a deep learning‐based approach to predict snake toxin proteins by using word embeddings. Front. Med. 10, 1291352 (2024). 10.3389/fmed.2023.1291352 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

The optimised model of predicting protein‐metal ion ligand binding residues

Caiyun Yang

Xiuzhen Hu

Zhenxing Feng

Sixi Hao

Gaimei Zhang

Shaohua Chen

Guodong Guo

Abstract

1. INTRODUCTION

2. MATERIALS AND METHODS

2.1. Data sets

TABLE 1.

2.2. Selection of features parameters

FIGURE 1.

2.3. Extraction of feature parameters

2.3.1. Position weight matrix extracts site conservation information

FIGURE 2.

2.3.2. Information from entropy

2.3.3. Propensity factor

2.4. Algorithm

2.4.1. Support vector machine algorithm

2.4.2. Random forest algorithm

2.4.3. K‐nearest neighbour algorithm

2.4.4. Deep neural network algorithm

2.5. The evaluation index

3. RESULTS AND DISCUSSION

FIGURE 3.

3.1. 5‐Fold cross‐validation prediction results

TABLE 2.

FIGURE 4.

3.2. Prediction results of independent test

TABLE 3.

FIGURE 5.

TABLE 4.

4. CONCLUSION

AUTHOR CONTRIBUTIONS

CONFLICT OF INTEREST STATEMENT

PATIENT CONSENT STATEMENT

PERMISSION TO REPRODUCE MATERIAL FROM OTHER SOURCES

CLINICAL TRIAL REGISTRATION

ACKNOWLEDGEMENT

A

TABLE A1.

TABLE A2.

TABLE A3.

Contributor Information

DATA AVAILABILITY STATEMENT

REFERENCES

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases