Skip to main content
BioMed Research International logoLink to BioMed Research International
. 2016 May 23;2016:4783801. doi: 10.1155/2016/4783801

Using the Relevance Vector Machine Model Combined with Local Phase Quantization to Predict Protein-Protein Interactions from Protein Sequences

Ji-Yong An 1, Fan-Rong Meng 1,*, Zhu-Hong You 1,2,*, Yu-Hong Fang 1, Yu-Jun Zhao 1, Ming Zhang 1
PMCID: PMC4893571  PMID: 27314023

Abstract

We propose a novel computational method known as RVM-LPQ that combines the Relevance Vector Machine (RVM) model and Local Phase Quantization (LPQ) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the LPQ feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We perform 5-fold cross-validation experiments on Yeast and Human datasets, and we achieve very high accuracies of 92.65% and 97.62%, respectively, which is significantly better than previous works. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the Yeast dataset. The experimental results demonstrate that our RVM-LPQ method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool for future proteomics research.

1. Introduction

Proteins are crucial molecules that participate in many cellular functions in an organism. Typically, proteins do not perform their roles individually, so detection of PPIs becomes more and more important. Knowledge of PPIs can provide insight into the molecular mechanisms of biological processes and lead to a better understanding of practical medical applications. In recent years, various high-throughput technologies, such as yeast two-hybrid screening methods [1, 2], immunoprecipitation [3], and protein chips [4], have been developed to detect interactions between proteins. Until now, a large quantity of PPI data for different organisms has been generated, and many databases, such as MINT [5], BIND [6], and DIP [7], have been built to store protein interaction data. However, these experimental methods have some shortcomings, such as being time-intensive and costly. In addition, the aforementioned approaches suffer from high rates of false positives and false negatives. For these reasons, predicting unknown PPIs is considered a difficult task using only biological experimental methods.

As a result, a number of computational methods have been proposed to infer PPIs from different sources of information, including phylogenetic profiles, tertiary structures, protein domains, and secondary structures [816]. However, these approaches cannot be employed when prior knowledge about a protein of interest is not available. With the rapid growth of protein sequence data, the protein sequence-based method is becoming the most widely used tool for predicting PPIs. Consequently, a number of protein sequence-based methods have been developed for predicting PPIs. For example, Bock and Gough [17] used a support vector machine (SVM) combined with several structural and physiochemical descriptors to predict PPIs. Shen et al. [18] developed a conjoint triad method to infer human PPIs. Martin et al. [19] used a descriptor called the signature product of subsequences and an expansion of the signature descriptor based on the available chemical information to predict PPIs. Guo et al. [20] used the SVM model combined with an autocorrelation descriptor to predict Yeast PPIs. Nanni and Lumini [21] proposed a method based on an ensemble of K-local hyperplane distances to infer PPIs. Several other methods based on protein amino acid sequences have been proposed in previous work [22, 23]. In spite of this, there is still space to improve the accuracy and efficiency of the existing methods.

In this paper, we propose a novel computational method that can be used to predict PPIs using only protein sequence data. The main improvements are the results of representing protein sequences using the LPQ feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise by using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. More specifically, we first represent each protein using a PSSM representation. Then, a LPQ descriptor is employed to capture useful information from each protein PSSM and generate a 256-dimensional feature vector. Next, dimensionality reduction method PCA is used to reduce the dimensions of the LPQ vector and the influence of noise. Finally, the RVM model is employed as the machine learning approach to carry out classification. The proposed method was executed using two different PPIs datasets: Yeast and Human. The experimental results are found to be superior to SVM and other previous methods, which prove that the proposed method performs incredibly well in predicting PPIs.

2. Materials and Methodology

2.1. Dataset

To verify the proposed method, two publicly available datasets are used in our study. The datasets are Yeast and Human that were obtained from the publicly available Database of Interaction Proteins (DIP) [24]. For better implementation, we selected 5594 positive protein pairs to build the positive dataset and 5594 negative protein pairs to build the negative dataset from the Yeast dataset. Similarly, we selected 3899 positive protein pairs to build the positive dataset and 4262 negative protein pairs to build the negative dataset from the Human dataset. Consequently, the Yeast dataset contains 11188 protein pairs and the Human dataset contains 8161 protein pairs.

2.2. Position Specific Scoring Matrix

A Position Specific Scoring Matrix (PSSM) is an M × 20 matrix X = {X ij:  i = 1 ⋯ M, j = 1 ⋯ 20} for a given protein, where M is the length of the protein sequence and 20 represents the 20 amino acids [2833]. A score X ij is allocated for the jth amino acid in the ith position of the given protein sequence in the PSSM. The score X ij of the position of a given sequence is expressed as X ij = ∑k=1 20 p(i, k) × q(j, k), where p(i, k) is the ratio of the frequency of the kth amino acid appearing at position i of the probe to be the total number of probes and q(j, k) is the value of Dayhoff's mutation matrix [34] between the jth and kth amino acids [3537]. As a result, a high score represents a largely conserved position and a low score represents a weakly conserved position [3840].

PSSMs are used to predict protein folding patterns, protein quaternary structural attributes, and disulfide connectivity [41, 42]. Here, we also use PSSMs to predict PPIs. In this paper, we used the Position Specific Iterated BLAST (PSI-BLAST) [43] to create PSSMs for each protein sequence. The e-value parameter was set as 0.001, and three iterations were selected for obtaining broadly and highly homologous sequences in the proposed method. The resulting PSSMs can be represented as 20-dimensional matrices. Each matrix is composed of L × 20 elements, where L is the total number of residues in a protein. The rows of the matrix represent the protein residues, and the columns of the matrix represent the 20 amino acids.

2.3. Local Phase Quantization

Local Phase Quantization (LPQ) has been described in detail in the literature [44]. The LPQ method is based on the blur invariance property of the Fourier phase spectrum [4547]. It is an operator used to process spatial blur in textural features of images. The spatial invariant blurring of an original image f(x) apparent in an observed image g(x) can be expressed as a convolution, given by

gx=fxhx, (1)

where h(x) is the function of the spread point of the blur, represents two-dimensional convolutions, and x is a vector of coordinates [x, y]T. In the Fourier domain, this amounts to

Gu=Fu·Hu, (2)

where G(u), F(u), and H(u) are the discrete Fourier transforms (DFT) of the blurred image g(x), the original image f(x), and h(x), respectively, and u is a vector of coordinates [u, v]T. According to the characteristic of the Fourier transform, the phase relations can be expressed as

Gu=Fu+Hu. (3)

When the spread point function h(x) is the center of symmetry, meaning h(x) = h(−x), the Fourier transform of h(x) always has a real value. As a result, its phase can be expressed as a two-valued function, given by

Hu=0ifHu0πifHu<0. (4)

This means that

Gu=Fu. (5)

The shape of the point spread function h(x) is similar to the Gaussian or Sin function. This ensures that H(u) ≥ 0 and ∠G(u) = ∠F(u) at low frequencies, which means that the phase characteristics are due to blur invariance. The local phase information can be extracted using the two-dimensional DFT in LPQ. In other words, a short-term Fourier transform (STFT) computed over a rectangular M × M neighborhood N x at each pixel position x of an image f(x) is represented by

Fu,x=yNxfxyej2πyuT=wuTfx, (6)

where w u is the basis vector of the two-dimensional DFT at frequency u and f x is another vector containing all M 2 image samples from N x. Using LPQ, the Fourier coefficients of four frequencies are calculated: u 1 = [a, 0]T, u 2 = [0, a]T, u 3 = [a, a]T, and u 4 = [a, −a]T, where a is a small enough number to satisfy h(u) ≥ 0. As a result, each pixel point can be expressed as a vector, given by

Fxc=Fu1,x,Fu2,x,Fu3,x,Fu4,x,Fx=ReFxc,ImFxcT. (7)

Then, using a simple scalar quantizer, the resulting vectors are quantized, given by

qjx=1,ifgjx00,otherwise, (8)

where g j(x) is the jth component of  F x. After quantization, F x becomes an eight-bit binary number vector, and each component of F x is assigned a weight of 2j. As a result, the quantized coefficients are represented as integer values between 0 and 255 using binary coding

fLPQx=07qjx2j. (9)

Finally, a histogram of these integer values from all image positions is composed and used as a 256-dimensional feature vector in classification. In this paper, the PSSM matrixes of each protein from the Yeast and Human datasets were converted to 256-dimensional feature vectors using this LPQ method.

2.4. Principal Component Analysis

Principal Component Analysis (PCA) is widely used to process data and reduce the dimensions of datasets. In this way, high-dimensional information can be projected to a low-dimensional subspace, while retaining the main information. The basic principle of PCA is as follows.

A multivariate dataset can be expressed as the following matrix X:

X=x1xN,xt=x1t,,xst,t=1,,N, (10)

where s is the number of variables and N is the number of samplings of each variable. PCA closely related to singular value decomposition (SVD) of matrix and the singular value decomposition of matrix X as follows:

X=i=1saibiciT, (11)

where c i represent feature vector of X T X and b i represent feature vector of XX T and a i is singular value. If there are m linear relationships between s variables, then m singular values are zero. Any line of X can be expressed as feature vector (q 1, q 2,…, q k):

XTt=i=1kaibici=i=1kritqi, (12)

where r i(t) = x(t)q i is projection x(t) on q i, feature vector (q 1, q 2,…, q k) is load vector, and r i(t) is score.

When there is a certain degree of linear correlation between the variables of matrix, then the projection of final several load vectors of matrix X will be enough small for resulting from measurement noise. As a result, the principal decomposition of matrix X is represented by

X=r1q1T+r2q2T++rkqkT+E, (13)

where E is error matrix and can be ignored. This does not bring about the obvious loss of useful information of data. In this paper, for the sake of reducing the influence of noise and improving the prediction accuracy, we reduce the dimensionality of the Yeast dataset from 256 to 180 and dimensionality of the Human dataset from 256 to 172 in the proposed method by using Principal Component Analysis.

2.5. Relevance Vector Machine

The characteristics of the Relevance Vector Machine have been described in detail in the literature [48]. For binary classification problems, assume that the training sample sets are {x n, t n}n=1 N, x nR d is the training sample, t n ∈ {0,1} represents the training sample label, t i represents the testing sample label, and t i = y i + ε i, where y i = w T φ(x i) = ∑j=1 N w j K(x i, x j) + w 0 is the model of classification prediction; ε i is additional noise, with a mean value of zero and a variance of σ 2, where ε i ~ N(0, σ 2), t i ~ N(y i, σ 2). Assuming that the training sample sets are independent and identically distributed, the observation of vector t obeys the following distribution [4951]:

ptx,w,σ2=2πσ2N/2exp12σ2tφw2, (14)

where φ is defined as follows:

φ=1kx1,x1kx1,xN1kxN,x1kxN,xN. (15)

The RVM uses sample label t to predict the testing sample label t , given by

ptt=ptw,σ2pw,σ2tdwdσ2. (16)

To make the value of most components of the weight vector w zero and to reduce the computational work of the kernel function, the weight vector w is subjected to additional conditions. Assuming that w i obeys a distribution with a mean value of zero and a variance of α i −1, the mean w i ~ N(0, α i −1), p(wa) = ∏i=0 N p(w ia i), where a is a hyperparameters vector of the prior distribution of the weight vector w. Hence,

ptt=ptw,a,σ2pw,a,σ2tdwdadσ2,ptw,a,σ2=Ntyx;w,σ2. (17)

Because p(w, a, σ 2t) cannot be obtained by an integral, it must be resolved using a Bayesian formula, given by

pw,a,σ2t=pwa,σ2,tpa,σ2t,pwa,σ2,t=ptw,σ2pwapta,σ2. (18)

The integral of the product of p(ta, σ 2) and p(wa) is given by

pta,σ2=2πN/2Ω1/2exptTΩ1t2,Ω=σ2I+φA1φT,A=diaga0,a1,,aN,pwa,σ2,t=2πN+1/2Σ1/2expwuTwu2,Σ=σ2φTφ+A1,u=σ2ΣφTt. (19)

Because p(a, σ 2t) ∝ p(ta, σ 2)p(a)p(σ 2) and p(a, σ 2t) cannot be solved by means of integration, the solution is approximated using the maximum likelihood method, represented by

aMP,σMP2=argmaxa,σ2pta,σ2. (20)

The iterative process of a MP and σ MP 2 is as follows:

ainew=γiμi2,σ2new=tφμ2Ni=0Nμi,γi=1aii,i, (21)

where ∑i, i is ith element on the diagonal of Σ and the initial value of a and σ 2 can be determined via the approximation of a MP and σ MP 2 by continuously updating using formula (21). After enough iterations, most of a i will be close to infinity, the value of the corresponding parameters in w i will be zero, and other a i values will be close to finite. The resulting corresponding parameters x i of a i are now referred to as the relevance vector.

2.6. Procedure of the Proposed Method

In the paper, our proposed method contains three steps: feature extraction, dimensionality reduction using PCA, and sample classification. The feature extraction step contains two steps: (1) each protein from the datasets is represented as a PSSM matrix and (2) the PSSM matrix of each protein is expressed as a 256-dimensional vector using the LPQ method. Dimensional reduction of the original feature vector is achieved using the PCA method. Finally, sample classification occurs in two steps: (1) the RVM model is used to carry out classification based on the datasets from Yeast and Human whose features have been extracted and (2) the SVM model is employed to execute classification on the dataset of Yeast. The flow chart of the proposed method is displayed in Figure 1.

Figure 1.

Figure 1

The flow chart of the proposed method.

2.7. Performance Evaluation

To evaluate the feasibility and efficiency of the proposed method, five parameters, the accuracy of prediction (Ac), sensitivity (Sn), specificity (Sp), precision (Pe), and Matthews's correlation coefficient (MCC), were computed. They are represented as follows:

Ac=TP+TNTP+FP+TN+FN,Sn=TPTP+TN,Sp=TNFP+TNPe=TPFP+TP,MCC=TP×TNFP×FNTP+FN×TN+FP×TP+FP×TN+FN, (22)

where TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively. True positives stand for the number of true interacting pairs correctly predicted. True negatives are the number of true noninteracting pairs predicted correctly. False positives stand for the number of true noninteracting pairs falsely predicted, and false negatives are the number of true interacting pairs falsely predicted to be noninteracting pairs. Moreover, a Receiver Operating Curve (ROC) was created to evaluate the performance of our proposed method.

3. Results and Discussion

3.1. Performance of the Proposed Method

To avoid the overfitting in the prediction model and to test the reliability of our proposed method, we used 5-fold cross-validation in our experiment. More specifically, the whole dataset was divided into five parts; four parts were employed for training model, and one part was used for testing. Five models were gained from the Yeast and Human datasets using this method, and each model was executed alone in the experiment. For the sake of ensuring fairness, the related parameters of the RVM model were set up the same for the two different datasets, Yeast and Human. Here, the Gaussian function was selected as the kernel function with the following parameters: width = 0.6, initapla = 1/N 2, and beta = 0, where width represents the width of the kernel function, N is the number of training samples, and the value of beta was defined as zero, which represents classification. The experimental results of the prediction models of the RVM classifier combined with Local Phase Quantization and the Position Specific Scoring Matrix and Principal Component Analysis based on the protein sequence information from the two datasets are listed in Tables 1 and 2.

Table 1.

5-fold cross-validation results shown by using our proposed method on the Yeast dataset.

Testing set Ac (%) Sn (%) Pe (%) MCC (%)
1 92.76 92.73 92.79 86.56
2 93.79 93.27 93.41 88.34
3 91.28 92.12 90.43 84.08
4 92.27 92.02 92.50 85.72
5 93.17 93.02 93.32 87.27
Average 92.65 ± 0.95 92.63 ± 0.55 92.67 ± 1.40 86.40 ± 1.61

Table 2.

5-fold cross-validation results shown by using our proposed method on the Human dataset.

Testing set Ac (%) Sn (%) Pe (%) MCC (%)
1 98.10 98.99 97.25 96.27
2 97.67 99.49 96.02 95.45
3 97.37 99.25 95.55 94.87
4 97.24 98.96 95.72 94.63
5 99.26 99.22 99.31 98.54
Average 97.92 ± 0.81 99.18 ± 0.21 96.77 ± 1.57 95.95 ± 1.58

Using the proposed method on the Yeast dataset, we achieved the results of average accuracy, sensitivity, precision, and MCC of 96.25%, 92.63%, 92.67%, and 87.27%. The standard deviations of these criteria values were 0.95%, 0.55%, 1.40%, and 1.61%, respectively. Similarly, we also obtained good results of average accuracy, sensitivity, precision, and MCC of 97.92%, 99.187%, 96.77%, and 95.95% on the Human dataset. The standard deviations of these criteria values were 0.81%, 0.21%, 1.57%, and 1.58%, respectively.

It can be seen from Tables 1 and 2 that the proposed method is accurate, robust, and effective for predicting PPIs. The better performance for predicting PPIs may be attributed to the feature extraction of the proposed method. This approach is novel and effective, and the choice of the classifier is accurate. The proposed feature extraction method contains three data processing steps. First, the PSSM matrix not only describes the order information for the protein sequence but also retains sufficient prior information; thus, it is widely used in other proteomics research. As a result, we converted each protein sequence to a PSSM matrix that contains all the useful information from each protein sequence. Second, because Local Phase Quantization has the advantage of blur invariance in the domain of image feature extraction, information can be effectively captured from the PSSMs using the LPQ method. Finally, while meeting the condition of maintaining the integrity of the information in the PSSM, we reduced the dimensions of each LPQ vector and reduced the influence of noise using Principal Component Analysis. Consequently, the sample information that was extracted using the proposed feature extraction method is very suitable for predicting PPIs.

3.2. Comparison with the SVM-Based Method

Although our proposed method achieved reasonably good results on the Yeast and Human datasets, its performance must be further validated against the state-of-the-art support vector machine (SVM) classifier. More specifically, we compared the classification performances between SVM and RVM model on the Yeast dataset using the same feature extraction method. The LIBSVM tool (available at https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/) was employed to carry out classification in SVM. Two corresponding parameters of SVM, c and g, are optimized using a grid search method. In the experiment, we set c = 0.7 and g = 0.6 and used a radial basis function as the kernel function.

The prediction results of the SVM and RVM methods on Yeast dataset are shown in Table 3, and the ROC curves are displayed in Figure 2. From Table 3, the prediction results of the SVM method achieved 85.34% average accuracy, 84.40% average sensitivity, 86.89% average specificity, and 74.97% average MCC, while the prediction results of the RVM method achieved 92.65% average accuracy, 92.63% average sensitivity, 92.67%, average specificity, and 86.40% average MCC. From these results, we can see that the RVM classifier is significantly better than the SVM classifier. In addition, the ROC curves were analyzed in Figure 2, showing that the ROC curve of the RVM classifier is significantly better than that of the SVM classifier. This clearly proves that the RVM classifier of the proposed method is an accurate and robust classifier for predicting PPIs. The increased classification performance of the RVM classifier compared with the SVM classifier can be explained by two reasons: (1) the obvious advantage of RVM is that the computational work of the kernel function is greatly reduced and (2) RVM overcomes the shortcoming of the kernel function being required to satisfy the condition of Mercer. Due to these reasons, the RVM classifier of our proposed method is significantly better than the SVM classifier. At the same time, it has been proven that the proposed method can yield highly accurate PPI predictions.

Table 3.

5-fold cross-validation results shown by using our proposed method on the Yeast dataset.

Testing set Ac (%) Sn (%) Sp (%) MCC (%)
SVM + PSSM + LPQ
1 85.96 84.77 87.13 75.86
2 84.18 82.86 85.43 73.33
3 85.52 84.10 86.97 75.22
4 85.29 84.12 86.47 74.91
5 85.76 86.16 88.45 75.55
Average 85.34 ± 0.69 84.40 ± 1.20 86.89 ± 1.09 74.97 ± 0.98

RVM + PSSM + LPQ
1 92.76 92.73 92.79 86.56
2 93.79 93.27 93.41 88.34
3 91.28 92.12 90.43 84.08
4 92.27 92.02 92.50 85.72
5 93.17 93.02 93.32 87.27
Average 92.65 ± 0.95 92.63 ± 0.55 92.67 ± 1.40 86.40 ± 1.61

Figure 2.

Figure 2

Comparison of ROC curves performed between RVM and SVM on the Yeast dataset.

3.3. Comparison with Other Methods

In addition, a number of PPI prediction methods based on protein sequences have been proposed. To prove the effectiveness of our proposed method, we compared the prediction ability of our proposed method, which uses an RVM model combined with a Position Specific Scoring Matrix, Local Phase Quantization, and Principal Component Analysis, with existing methods on Yeast and Human datasets. It can be seen from Table 4 that the average prediction accuracy of the five different methods is between 75.08% and 89.33% for Yeast dataset. The prediction accuracies of these methods are lower than that of the proposed method, which is 92.65%. Similarly, the precision and sensitivity of our proposed method are also superior to those of the other methods. At the same time, Table 5 shows the average prediction accuracy between the six different methods and the proposed method on the Human dataset. From Table 5, the prediction accuracies yielded by the other methods are between 89.3% and 96.4%. None of these methods obtains higher prediction accuracy than our proposed method. From Tables 4 and 5, it can be observed that the proposed method yielded obviously better prediction results compared to other existing methods based on ensemble classifiers. All these results prove that the RVM classifier combined with Local Phase Quantization and the Position Specific Scoring Matrix and Principal Component Analysis can improve the prediction accuracy relative to current state-of-the-art methods. Our method improves predictions by using a correct classifier and a novel extraction method that captures the useful evolutionary information.

Table 4.

Predicting ability of different methods on the Yeast dataset.

Model Testing set Ac (%) Sn (%) Pe (%) MCC (%)
Guo et al.'s work [20] ACC 89.33 ± 2.67 89.93 ± 3.60 88.77 ± 6.16 N/A
AC 87.36 ± 1.38 87.30 ± 4.68 87.82 ± 4.33 N/A

Zhou et al.'s work [25] SVM + LD 88.56 ± 0.33 87.37 ± 0.22 89.50 ± 0.60 77.15 ± 0.68

Yang et al.'s work [26] Cod1 75.08 ± 1.13 75.81 ± 1.20 74.75 ± 1.23 N/A
Cod2 80.04 ± 1.06 76.77 ± 0.69 82.17 ± 1.35 N/A
Cod3 80.41 ± 0.47 78.14 ± 0.90 81.66 ± 0.99 N/A
Cod4 86.15 ± 1.17 81.03 ± 1.74 90.24 ± 1.34 N/A

You et al.'s work [27] PCA-EELM 87.00 ± 0.29 86.15 ± 0.43 87.59 ± 0.32 77.36 ± 0.44

The proposed method RVM 92.65 ± 0.95 92.63 ± 0.55 92.67 ± 1.40 86.40 ± 1.61

Table 5.

Predicting ability of different methods on the Human dataset.

Model Ac (%) Sn (%) Pe (%) MCC (%)
LDA + RF [28] 96.4 94.2 N/A 92.8
LDA + RoF [28] 95.7 97.6 N/A 91.8
LDA + SVM [28] 90.7 89.7 N/A 81.3
AC + RF [28] 95.5 94.0 N/A 91.4
AC + RoF [28] 95.1 93.3 N/A 91.0
AC + SVM [28] 89.3 94.0 N/A 79.2
The proposed method 97.92 99.18 96.77 95.95

4. Conclusion

Knowledge of PPIs is becoming increasingly more important, which has prompted the development of computational methods. Though many approaches have been developed to solve this problem, the effectiveness and robustness of previous prediction models can still be improved. In this study, we explore a novel method using an RVM classifier combined with Local Phase Quantization and a Position Specific Scoring Matrix. From the experimental results, it can be seen that the prediction accuracy of the proposed method is obviously higher than those of previous methods. It is a very promising and useful support tool for future proteomics research. The main improvements of the proposed method come from adopting an effective feature extraction method that can capture useful evolutionary information. Moreover, the results showed that PCA significantly improves the prediction accuracy by integrating the useful information and reducing the influence of noise. In addition, the experimental results show that the RVM model is suitable for predicting PPIs. In conclusion, the proposed method is an efficient, reliable, and powerful prediction model and can be a useful tool for future proteomics research.

Acknowledgments

This work is supported by the National Science Foundation of China, under Grants 61373086 and 61572506, in part by the Shenzhen Foundational Research Funding under Grant JCYJ20150626110425228.

Competing Interests

The authors declare that there are no competing interests regarding the publication of this paper.

Authors' Contributions

The authors wish it to be known that, in their opinion, Ji-Yong An and Zhu-Hong You should be regarded as joint first authors.

References

  • 1.Gavin A.-C., Bösche M., Krause R., et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002;415(6868):141–147. doi: 10.1038/415141a. [DOI] [PubMed] [Google Scholar]
  • 2.Ito T., Chiba T., Ozawa R., Yoshida M., Hattori M., Sakaki Y. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proceedings of the National Academy of Sciences of the United States of America. 2001;98(8):4569–4574. doi: 10.1073/pnas.061034498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Yuen H., Albrecht G., Adrian H., et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002;415(6868):180–183. doi: 10.1038/415180a. [DOI] [PubMed] [Google Scholar]
  • 4.Zhu H., Bilgin M., Bangham R., et al. Global analysis of protein activities using proteome chips. Biophysical Journal. 2001;293(5537):2101–2105. doi: 10.1126/science.1062191. [DOI] [PubMed] [Google Scholar]
  • 5.Zanzoni A., Montecchi-Palazzi L., Quondam M., Ausiello G., Helmer-Citterich M., Cesareni G. MINT: a molecular INTeraction database. FEBS Letters. 2002;513(1):135–140. doi: 10.1016/s0014-5793(01)03293-8. [DOI] [PubMed] [Google Scholar]
  • 6.Bader G. D., Doron B., Hogue C. W. V. BIND: the biomolecular interaction network database. Nucleic Acids Research. 2003;29(1):242–245. doi: 10.1093/nar/29.1.242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Salwinski L., Miller C. S., Smith A. J., Pettit F. K., Bowie J. U., Eisenberg D. DIP, the database of interacting proteins. Nucleic Acids Research. 2002;28(1):289–291. doi: 10.1093/nar/28.1.289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Luo X., Ming Z., You Z., Li S., Xia Y., Leung H. Improving network topology-based protein interactome mapping via collaborative filtering. Knowledge-Based Systems. 2015;90:23–32. doi: 10.1016/j.knosys.2015.10.003. [DOI] [Google Scholar]
  • 9.Li S., You Z.-H., Guo H., Luo X., Zhao Z.-Q. Inverse-free extreme learning machine with optimal information updating. IEEE Transactions on Cybernetics. 2015;46(5):1229–1241. doi: 10.1109/tcyb.2015.2434841. [DOI] [PubMed] [Google Scholar]
  • 10.You Z.-H., Yu J.-Z., Zhu L., Li S., Wen Z.-K. A MapReduce based parallel SVM for large-scale predicting protein-protein interactions. Neurocomputing. 2014;145:37–43. doi: 10.1016/j.neucom.2014.05.072. [DOI] [Google Scholar]
  • 11.You Z.-H., Li S., Gao X., Luo X., Ji Z. Large-scale protein-protein interactions detection by integrating big biosensing data with computational model. BioMed Research International. 2014;2014:9. doi: 10.1155/2014/598129.598129 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.You Z.-H., Lei Y.-K., Zhu L., Xia J., Wang B. Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis. BMC Bioinformatics. 2013;14(supplement 8, article S10) doi: 10.1186/1471-2105-14-s8-s10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Lei Y.-K., You Z.-H., Ji Z., Zhu L., Huang D.-S. Assessing and predicting protein interactions by combining manifold embedding with multiple information integration. BMC Bioinformatics. 2012;13(supplement 7, article S3) doi: 10.1186/1471-2105-13-s7-s3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.You Z.-H., Yin Z., Han K., Huang D.-S., Zhou X. A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network. BMC Bioinformatics. 2010;11(1, article 343) doi: 10.1186/1471-2105-11-343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Lan X., Bonneville R., Apostolos J., Wu W., Jin V. X. W-ChIPeaks: a comprehensive web application tool for processing ChIP-chip and ChIP-seq data. Bioinformatics. 2011;27(3):428–430. doi: 10.1093/bioinformatics/btq669.btq669 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Lan X., Witt H., Katsumura K., et al. Integration of Hi-C and ChIP-seq data reveals distinct types of chromatin linkages. Nucleic Acids Research. 2012;40(16):7690–7704. doi: 10.1093/nar/gks501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bock J. R., Gough D. A. Whole-proteome interaction mining. Bioinformatics. 2003;19(1):125–135. doi: 10.1093/bioinformatics/19.1.125. [DOI] [PubMed] [Google Scholar]
  • 18.Shen J., Zhang J., Luo X., et al. Predicting protein-protein interactions based only on sequences information. Proceedings of the National Academy of Sciences of the United States of America. 2007;104(11):4337–4341. doi: 10.1073/pnas.0607879104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Martin S., Roe D., Faulon J.-L. Predicting protein-protein interactions using signature products. Bioinformatics. 2005;21(2):218–226. doi: 10.1093/bioinformatics/bth483. [DOI] [PubMed] [Google Scholar]
  • 20.Guo Y., Yu L., Wen Z., Li M. Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences. Nucleic Acids Research. 2008;36(9):3025–3030. doi: 10.1093/nar/gkn159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Nanni L., Lumini A. An ensemble of K-local hyperplanes for predicting protein-protein interactions. Bioinformatics. 2006;22(10):1207–1210. doi: 10.1093/bioinformatics/btl055. [DOI] [PubMed] [Google Scholar]
  • 22.Nanni L. Fusion of classifiers for predicting protein-protein interactions. Neurocomputing. 2005;68:289–296. doi: 10.1016/j.neucom.2005.03.004. [DOI] [Google Scholar]
  • 23.Nanni L. Hyperplanes for predicting protein-protein interactions. Neurocomputing. 2005;69(1–3):257–263. doi: 10.1016/j.neucom.2005.05.007. [DOI] [Google Scholar]
  • 24.Xenarios I., Salwínski Ł., Duan X. J., Higney P., Kim S.-M., Eisenberg D. DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Research. 2002;30(1):303–305. doi: 10.1093/nar/30.1.303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Zhou Y. Z., Gao Y., Zheng Y. Y. Prediction of Protein-Protein Interactions Using Local Description of Amino Acid Sequence. Berlin, Germany: Springer; 2011. [Google Scholar]
  • 26.Yang L., Xia J.-F., Gui J. Prediction of protein-protein interactions from protein sequence using local descriptors. Protein & Peptide Letters. 2010;17(9):1085–1090. doi: 10.2174/092986610791760306. [DOI] [PubMed] [Google Scholar]
  • 27.You Z. H., Lei Y. K., Zhu L., Xia J., Wang B. Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis. BMC Bioinformatics. 2013;14(8):69–75. doi: 10.1186/1471-2105-14-69. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Liu B., Liu F., Fang L., Wang X., Chou K.-C. RepDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects. Bioinformatics. 2015;31(8):1307–1309. doi: 10.1093/bioinformatics/btu820. [DOI] [PubMed] [Google Scholar]
  • 29.Liu B., Wang X., Zou Q., Dong Q., Chen Q. Protein remote homology detection by combining chou's pseudo amino acid composition and profile-based protein representation. Molecular Informatics. 2013;32(9-10):775–782. doi: 10.1002/minf.201300084. [DOI] [PubMed] [Google Scholar]
  • 30.Liu B., Wang S., Wang X. DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation. Scientific Reports. 2015;5 doi: 10.1038/srep15479.15479 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Chen X., Yan C. C., Zhang X., et al. WBSMDA: within and between score for MiRNA-disease association prediction. Scientific Reports. 2016;6, article 21106 doi: 10.1038/srep21106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.You Z.-H., Li J., Gao X., et al. Detecting protein-protein interactions with a novel matrix-based protein sequence representation and support vector machines. BioMed Research International. 2015;2015:9. doi: 10.1155/2015/867516.867516 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Wong L., You Z.-H., Ming Z., Li J., Chen X., Huang Y.-A. Detection of interactions between proteins through rotation forest and local phase quantization descriptors. International Journal of Molecular Sciences. 2015;17(1):p. 21. doi: 10.3390/ijms17010021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Dayhoff M. A model of evolutionary change in proteins. Atlas of Protein Sequence & Structure. 1978;5:345–352. [Google Scholar]
  • 35.You Z.-H., Chan K. C. C., Hu P. Predicting protein-protein interactions from primary protein sequences using a novel multi-scale local feature representation scheme and the random forest. PLoS ONE. 2015;10(5) doi: 10.1371/journal.pone.0125811.e0125811 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Huang Y., You Z., Gao X., Wong L., Wang L. Using weighted sparse representation model combined with discrete cosine transformation to predict protein-protein interactions from protein sequence. BioMed Research International. 2015;2015:10. doi: 10.1155/2015/902198.902198 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Huang Q., You Z., Zhang X., Zhou Y. Prediction of protein-protein interactions with clustered amino acids and weighted sparse representation. International Journal of Molecular Sciences. 2015;16(5):10855–10869. doi: 10.3390/ijms160510855. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Gribskov M., McLachlan A. D., Eisenberg D. Profile analysis: detection of distantly related proteins. Proceedings of the National Academy of Sciences of the United States of America. 1987;84(13):4355–4358. doi: 10.1073/pnas.84.13.4355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Liu B., Fang L., Long R., Lan X., Chou K. C. iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition. Bioinformatics. 2016;32(3):362–369. doi: 10.1093/bioinformatics/btv604. [DOI] [PubMed] [Google Scholar]
  • 40.Bin L., Junjie C., Xiaolong X. Application of learning to rank to protein remote homology detection. Bioinformatics. 2015;31(21):3492–3498. doi: 10.1093/bioinformatics/btv413. [DOI] [PubMed] [Google Scholar]
  • 41.Liu B., Liu F., Wang X., Chen J., Fang L., Chou K. Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Research. 2015;43(1):W65–W71. doi: 10.1093/nar/gkv458. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Liu B., Zhang D., Xu R., et al. Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics. 2014;30(4):472–479. doi: 10.1093/bioinformatics/btt709. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Altschul S. F., Koonin E. V. Iterated profile searches with PSI-BLAST—a tool for discovery in protein databases. Trends in Biochemical Sciences. 1998;23(11):444–447. doi: 10.1016/s0968-0004(98)01298-5. [DOI] [PubMed] [Google Scholar]
  • 44.Ojansivu V., Heikkilä J. Blur insensitive texture classification using local phase quantization. In: Elmoataz A., Lezoray O., Nouboud F., Mammass D., editors. Image and Signal Processing. Vol. 5099. 2008. pp. 236–243. (Lecture Notes in Computer Science). [DOI] [Google Scholar]
  • 45.Wang H., Song A., Li B., Xu B., Li Y. Psychophysiological classification and experiment study for spontaneous EEG based on two novel mental tasks. Technology and Health Care. 2015;23(supplement 2):S249–S262. doi: 10.3233/thc-150960. [DOI] [PubMed] [Google Scholar]
  • 46.Li Y., Olson E. B. Structure tensors for general purpose LIDAR feature extraction. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA '11); May 2011; Shanghai, China. pp. 1869–1874. [DOI] [Google Scholar]
  • 47.Li Y., Olson E. B. A general purpose feature extractor for light detection and ranging data. Sensors. 2010;10(11):10356–10375. doi: 10.3390/s101110356. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Tipping M. E. Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research. 2001;1(3):211–244. doi: 10.1162/15324430152748236. [DOI] [Google Scholar]
  • 49.Li Y., Li S., Song Q., Liu H., Meng M. Q.-H. Fast and robust data association using posterior based approximate joint compatibility test. IEEE Transactions on Industrial Informatics. 2014;10(1):331–339. doi: 10.1109/TII.2013.2271506. [DOI] [Google Scholar]
  • 50.Li S., Li Y. Nonlinearly activated neural network for solving time-varying complex sylvester equation. IEEE Transactions on Cybernetics. 2014;44(8):1397–1407. doi: 10.1109/TCYB.2013.2285166. [DOI] [PubMed] [Google Scholar]
  • 51.Li Y., Li S., Ge Y. A biologically inspired solution to simultaneous localization and consistent mapping in dynamic environments. Neurocomputing. 2013;104:170–179. doi: 10.1016/j.neucom.2012.10.011. [DOI] [Google Scholar]

Articles from BioMed Research International are provided here courtesy of Wiley

RESOURCES