Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2017 Jul 19;7:5827. doi: 10.1038/s41598-017-06195-y

Prediction of presynaptic and postsynaptic neurotoxins by combining various Chou’s pseudo components

Haiyan Huo 1,#, Tao Li 2,#, Shiyuan Wang 3, Yingli Lv 3, Yongchun Zuo 4,, Lei Yang 3,
PMCID: PMC5517432  PMID: 28724993

Abstract

Presynaptic and postsynaptic neurotoxins are two groups of neurotoxins. Identification of presynaptic and postsynaptic neurotoxins is an important work for numerous newly found toxins. It is both costly and time consuming to determine these two neurotoxins by experimental methods. As a complement, using computational methods for predicting presynaptic and postsynaptic neurotoxins could provide some useful information in a timely manner. In this study, we described four algorithms for predicting presynaptic and postsynaptic neurotoxins from sequence driven features by using Increment of Diversity (ID), Multinomial Naive Bayes Classifier (MNBC), Random Forest (RF), and K-nearest Neighbours Classifier (IBK). Each protein sequence was encoded by pseudo amino acid (PseAA) compositions and three biological motif features, including MEME, Prosite and InterPro motif features. The Maximum Relevance Minimum Redundancy (MRMR) feature selection method was used to rank the PseAA compositions and the 50 top ranked features were selected to improve the prediction accuracy. The PseAA compositions and three kinds of biological motif features were combined and 12 different parameters that defined as P1-P12 were selected as the input parameters of ID, MNBC, RF, and IBK. The prediction results obtained in this study were significantly better than those of previously developed methods.

Introduction

Neurotoxins can be divided into presynaptic and postsynaptic neurotoxins based on their mechanism of action1. Presynaptic neurotoxins are commonly called β-neurotoxins. These neurotoxins act on the plasmatic membranes of nerve endings, promote the generation of interterminal signals, and lead to a massive stimulation of the release of the neuromediator24. Presynaptic neurotoxins are rich sources of phospholipases59 and produce neuromuscular blockade by inhibiting the release of acetylcholine from the presynaptic membrane10. Postsynaptic neurotoxins are commonly called α-neurotoxins1113, and most of these neurotoxins are from the venoms of snakes of families. Postsynaptic neurotoxins bind specially to the nicotinic acetylcholine receptor resulting in the prevention of nerve transmission, leading to death from asphyxiation1417. Due to postsynaptic neurotoxins have similarity action to the reversible acetylcholine receptor antagonist curare with curare-mimetic toxins, there are often referred to as “curare-mimetic toxins”5. These two neurotoxins contribute to the understanding of the molecular steps of neurotransmission, and have potential use in cell biology and neuroscience research as well as therapeutics in some human neurological disorders. For example, presynaptic neurotoxins have been used for the treatment of migraine headache and cerebral palsy18. With the numerous of neurotoxin sequences generated in the post-genomic era, it is desired to develop a method for identification of neurotoxins for basic research and drug discovery.

In recent years, many computational algorithms have been developed for analyzing and predicting toxins. Short animal toxin and toxin-like protein sequences can be predicted by the web-based classifier ClanTox19, 20. The neurotoxins and bacterial toxins derived from Swiss-Prot were predicted by Feed-forwarded Neural Network (FNN), Partial Recurrent Neural Network (RNN) and Support Vector Machine (SVM)2123. Four kinds of conotoxin superfamilies for 116 conotoxin sequences were predicted by ISort predictor, Least Hamming, Multi-class SVMs, one-versus-rest SVMs24, modified Mahalanobis discriminant25, and dHKNN26. Four conotoxin superfamilies for 261 conotoxin sequences that collected from Swiss-Prot were predicted by SVM27. In our previous work, based on the Animal Toxin Database (ATDB)28, 29, the presynaptic and postsynaptic neurotoxins were predicted by Increment of Diversity (ID)30, and the correlation coefficient (CC) value was 0.7963 when evaluated by the jackknife test.

In this study, four algorithms were proposed for predicting presynaptic and postsynaptic neurotoxins by using Increment of Diversity (ID), Multinomial Naive Bayes Classifier (MNBC), Random Forest (RF), and K-nearest Neighbours Classifier (IBK). Pseudo amino acid (PseAA) compositions, MEME motif features31, Prosite motif features32 and InterPro motif features33 were used to represent the protein sequences. The Maximum Relevance Minimum Redundancy (MRMR)34, 35 was used to rank the features for improving the performance of the predictors. When these algorithms were applied to the neurotoxin dataset with 78 presynaptic neurotoxins and 69 postsynaptic neurotoxins, the overall success rates obtained by the jackknife test were significantly higher than those of existing classifier on the same dataset. In addition, as demonstrated by a series of recent publications3643 in compliance with Chou’s 5-step rule44, to establish a really useful sequence-based statistical predictor for a biological system, we should follow the following five guidelines: (a) construct or select a valid benchmark dataset to train and test the predictor; (b) formulate the biological sequence samples with an effective mathematical expression that can truly reflect their intrinsic correlation with the target to be predicted; (c) introduce or develop a powerful algorithm (or engine) to operate the prediction; (d) properly perform cross-validation tests to objectively evaluate the anticipated accuracy of the predictor; (e) establish a user-friendly web-server for the predictor that is accessible to the public. Below, we are to describe how to deal with these steps one-by-one.

Results

Phylogenetic trees of presynaptic and postsynaptic neurotoxins

In this study, the Molecular Evolutionary Genetics Analysis (MEGA) software45 was used to provide the phylogenetic trees of presynaptic and postsynaptic neurotoxins, only the neurotoxins that had the signal peptides were uploaded to the MEGA software for generating phylogenetic trees. The phylogenetic trees for presynaptic and postsynaptic neurotoxins were shown in Fig. 1A and B, respectively. These two figures illustrated some useful information about the inferred evolutionary relationships among those two neurotoxins, and the neurotoxins that in the same branch were believed to have a common ancestor. The Fig. 1A and B may also help us to better understand how the presynaptic and postsynaptic neurotoxins diversified over times.

Figure 1.

Figure 1

The phylogenetic trees for (A) presynaptic neurotoxins and (B) postsynaptic neurotoxins.

Analysis of Prosite motif features

In 78 presynaptic neurotoxins, PS00118 was conserved in 29 sequences and PS00119 was conserved in 31 sequences. PS00118 is a pattern of phospholipase A2 histidine active site which is centered on the active site histidine and PS00119 is a pattern of phospholipase A2 aspartic acid active site which is centered on the active site aspartic acid. Both PS00118 and PS00119 contain three cysteines that involved in disulfide bonds. PS60004 belongs to PROSITE documentation PDOC60004 which is a pattern of omega-conotoxin family signature, and appears in 19 presynaptic neurotoxins. Omega conotoxins are calcium channel blockers and the cysteine arrangement [C-C-CC-C-C] is included in PS60004. PS00280, PS01138, PS01186, PS60015, PS60021, PS60022, PS60023 and PS60025 are also observed in presynaptic neurotoxins. PS00272 is a pattern of snake toxin signature and observed in 49 sequences. Snake toxins are a group of short and long neurotoxins, cytotoxins, short toxins and miscellanous venom peptides. Snake toxin signature includes four conserved cysteines and a conserved proline is thought to be important for the maintenance of the tertiary structure. The second cysteine in this pattern is linked to the third cysteine by a disulfide bond. PS60014 is a pattern of alpha conotoxin family signature and appears in 8 postsynaptic neurotoxins. This pattern includes a common part of the cysteine arrangement [CC-C-C], four conserved cysteines are believed to be important for the maintenance of the tertiary structure of alpha conotoxins.

The comparison of MEME motifs (Fig. 2) with Prosite motifs shows that the conserved region from the fourth site to the eleventh site in the presynaptic neurotoxin motif 2 is corresponded to PS000118, this indicate that the presynaptic neurotoxin motif 2 may have the biological function of PS000118; PS000119 is corresponded to the conserved region from the third site to the eleventh site in the presynaptic neurotoxin motif 3; for PS00272, the conserved region from the tenth site to the twenty second site is corresponded to the first site to the twelfth site in the postsynaptic neurotoxin motif 2.

Figure 2.

Figure 2

MEME motifs for (A) presynaptic neurotoxins motif 1, (B) presynaptic neurotoxins motif 2, (C) presynaptic neurotoxins motif 3, (D) postsynaptic neurotoxins motif 1, (E) postsynaptic neurotoxins motif 2, and (F) postsynaptic neurotoxins motif 3 in logo format. The regular expression for each MEME motif was shown at the bottom of each figure.

Prediction of presynaptic and postsynaptic neurotoxins

In order to investigate the influence of different parameters on the prediction quality, 12 different parameters were selected as the input parameters of ID, MNBC, RF, and IBK. The jackknife test results obtained by ID, MNBC, RF, and IBK with 12 different parameters were shown in Tables 1 and 2, Fig. 3A and B.

Table 1.

Results obtained by ID, MNBC, RF and IBK in identifying presynaptic and postsynaptic neurotoxins with 12 parameters.

ID MNBC RF IBK
Presynaptic Postsynaptic Presynaptic Postsynaptic Presynaptic Postsynaptic Presynaptic Postsynaptic
Sn (%) Sp (%) Sn (%) Sp (%) Sn (%) Sp (%) Sn (%) Sp (%) Sn (%) Sp (%) Sn (%) Sp (%) Sn (%) Sp (%) Sn (%) Sp (%)
P1a 88.46 92.00 91.30 87.50 91.03 92.21 91.30 90.00 96.15 82.61 86.21 95.00 88.46 82.61 85.19 86.36
P2 92.31 92.31 91.30 91.30 92.31 92.31 91.30 91.30 98.72 84.06 87.50 98.31 92.31 85.51 87.80 90.77
P3 91.03 92.21 91.30 90.00 93.59 92.41 91.30 92.65 94.87 86.96 89.16 93.75 91.03 89.86 91.03 89.86
P4 93.59 92.41 91.30 92.65 94.87 92.50 91.30 94.03 96.15 88.41 90.36 95.31 93.59 88.41 90.12 92.42
P5 93.59 92.41 91.30 92.65 91.03 92.21 91.30 90.00 97.44 85.51 88.37 96.72 92.31 88.41 90.00 91.04
P6 94.87 92.50 91.30 94.03 93.59 92.41 91.30 92.65 97.44 85.51 88.37 96.72 94.87 88.41 90.24 93.85
P7 97.44 91.57 89.86 96.88 98.72 91.67 89.86 98.41 96.15 88.41 90.36 95.31 84.62 88.41 89.19 83.56
P8 100.0 90.70 88.41 100.0 100.0 91.76 89.86 100.0 100.00 89.86 91.76 100.00 87.18 88.41 89.47 85.92
P9 98.72 92.77 91.30 98.44 98.72 91.67 89.86 98.41 97.44 91.30 92.68 96.92 88.46 88.41 89.61 87.14
P10 100.0 91.76 89.86 100.0 100.0 90.70 88.41 100.0 100.00 89.86 91.76 100.00 92.31 94.20 94.74 91.55
P11 98.72 91.67 89.86 98.41 97.44 92.68 91.30 96.92 97.44 91.43 92.68 96.97 89.74 92.75 93.33 88.89
P12 98.72 92.77 91.30 98.44 100.0 92.86 91.30 100.0 100.00 91.30 92.86 100.00 92.31 94.20 94.74 91.55

aCome from30 by using Increment of Diversity (ID).

Table 2.

Overall predictive accuracy and CC values obtained by ID, MNBC, RF and IBK in identifying presynaptic and postsynaptic neurotoxins with 12 parameters.

ID MNBC RF IBK
Presynaptic Postsynaptic Presynaptic Postsynaptic Presynaptic Postsynaptic Presynaptic Postsynaptic
Acc (%) CC Acc (%) CC Acc (%) CC Acc (%) CC
P1a 89.80 0.7963 91.16 0.8227 89.80 0.7998 85.71 0.7131
P2 91.84 0.8361 91.84 0.8361 91.84 0.8428 89.12 0.7819
P3 91.16 0.8227 92.52 0.8497 91.16 0.8237 90.48 0.8088
P4 92.52 0.8497 93.20 0.8635 92.52 0.8511 91.16 0.8227
P5 92.52 0.8497 91.16 0.8227 91.84 0.8401 90.48 0.8088
P6 93.20 0.8635 92.52 0.8497 91.84 0.8401 91.84 0.8368
P7 93.88 0.8786 94.56 0.8932 92.52 0.8511 86.39 0.7289
P8 94.56 0.8954 95.24 0.9080 95.24 0.9080 87.76 0.7549
P9 95.24 0.9061 94.56 0.8932 94.56 0.8917 88.44 0.7681
P10 95.24 0.9080 94.56 0.8954 95.24 0.9080 93.20 0.8640
P11 94.56 0.8932 94.56 0.8917 94.59 0.8990 91.16 0.8236
P12 95.24 0.9061 95.92 0.9208 95.92 0.9208 93.20 0.8640

aCome from30 by using Increment of Diversity (ID).

Figure 3.

Figure 3

(A) Overall predictive accuracies and (B) CC values obtained by four different algorithms with 12 parameters.

In this study, when using P12 as the input parameters of ID, MNBC, RF, and IBK for predicting presynaptic and postsynaptic neurotoxins, the overall accuracy of 95.92% and the CC value of 0.9208 were obtained by MNBC and RF, which were the highest overall accuracy and CC value in this study, and were also higher than the predictive results in our previous work30. For prediction of presynaptic and postsynaptic neurotoxins, based on the same input parameters, generally speaking, MNBC had the best prediction quality among four algorithms. For example, based on the parameters of P1, P2, P3, P4, P7, P8 and P12, the CC values were 0.8227, 0.8361, 0.8497, 0.8635, 0.8932, 0.9080 and 0.9208 for MNBC, which were 0.0264, 0, 0.0270, 0.0138, 0.0146, 0.0126 and 0.0147 higher than those of ID. The overall accuracies obtained by MNBC were better than or equivalent to those of ID, RF and IBK when using the same parameters. These results clear indicated that MNBC could perform better than three other algorithms for prediction of presynaptic and postsynaptic neurotoxins.

Based on the same algorithm, it was clear that the performances were improved when sequence derived features and motif features were used as input parameters, when compared with other sequence derived features. For ID, when using P2, P3, P4, P5 and P6 as the input parameters, the CC values were 0.8361, 0.8227, 0.8497, 0.8497 and 0.8635, respectively, which were higher than the CC value obtained by P1. Similarly, the higher CC values could also be obtained by MNBC, RF and IBK when using the same parameters. In addition, we found that the predictive results obtained by 19 motifs (13 Prosite motifs and 6 MEME motifs) were better than those obtained by 13 Prosite motifs or 6 MEME motifs in most cases. These results clearly illustrated that the MEME motifs, Prosite motifs and InterPro motifs could significantly improve the predictive power of ID, MNBC, RF and IBK for predicting the presynaptic and postsynaptic neurotoxins.

In this study, the prediction performance was improved by the effective feature selection method when using the same algorithm. Tables 1 and 2 illustrated that the results of the ID, MNBC, RF and IBK with the parameters of P1-P7. Except for the predictive results of IBK, it was clear that higher or equivalent overall accuracy had been obtained by the proposed algorithms with the parameter of P7, when compared with the overall accuracy obtained by the parameters of P1-P6. For example, for the problem of presynaptic and postsynaptic neurotoxins prediction, when P7 was selected as the input parameter, the CC value was 0.8786 for ID, which was 0.0823, 0.0425, 0.0559, 0.0289, 0.0289, and 0.0151 higher than those of P1-P6, respectively. Similarly, except for the predictive results of IBK, the CC value obtained by P7 for MNBC, and RF were also higher than those of P1-P6. These results clearly indicated that MRMR feature selection method was effective and helpful for the prediction of presynaptic and postsynaptic neurotoxins.

For the problem of presynaptic and postsynaptic neurotoxins prediction, as shown in Tables 1 and 2, the sensitivity of presynaptic neurotoxins and the specificity of postsynaptic neurotoxins varied significantly with the parameters, indicating that the prediction results of presynaptic neurotoxins were more correlated with different parameters than the prediction results of postsynaptic neurotoxins. That was because more protein motifs were discovered in the presynaptic neurotoxins than in the postsynaptic neurotoxins. For example, 11 Prosite motifs were discovered by ScanProsite in the presynaptic neurotoxins, however, only 2 Prosite motifs were discovered by ScanProsite in the postsynaptic neurotoxins.

As shown Tables 1 and 2, the best predictive results of ID were obtained by using P10 as the input parameter. In this case, all of the presynaptic neurotoxins were predicted correctly, and 7 postsynaptic neurotoxins were predicted incorrectly. The Animal Toxin database entries numbers of these 7 postsynaptic neurotoxins were AT0001110, AT0000526, AT0002477, AT0000527, AT0000327, AT0002380 and AT0000334, respectively. MEME motifs were not discovered in these postsynaptic neurotoxins, only Prosite motifs and InteroPro motifs were discovered in AT000110 and AT0002380. However, AT000110 and AT0002380 not only belonged to the presynaptic neurotoxins but also belonged to the postsynaptic neurotoxins, and in this case, they were predicted as the presynaptic neurotoxins. Based on these results, we suspected that the motif features may provide an important role in the problem of presynaptic and postsynaptic neurotoxins prediction.

Discussion

In this paper, in order to predict presynaptic and postsynaptic neurotoxins, 12 different parameters were selected as the input parameters of ID, MNBC, RF, and IBK. The prediction results of the jackknife test were shown in Tables 1 and 2, and Fig. 3. Based on the similar results of different methods presented in Tables 1 and 2, and Fig. 3, we suspected that when using the same parameters, ID, MNBC, RF, and IBK had little impact on prediction results for predicting presynaptic and postsynaptic neurotoxins, and this maybe an intrinsic characteristics of machine learning algorithms which also occurred in the other prediction problems. However, we also found that the input parameters have big impact on prediction results. Taking the ID algorithm as an example, we found that the Acc can increase from 89.80% to 95.24%, and the CC can increase from 0.7963 to 0.9080 for prediction the presynaptic and postsynaptic neurotoxins. Similar improved Acc and CC can also be obtained by other three algorithms. So, the input parameters should have more impact on the prediction results.

In our previous work30, for using the same dataset, 78 presynaptic neurotoxins and 69 postsynaptic neurotoxins were predicted by Increment of Diversity (ID), the highest Sn, Sp and CC obtained in our previous work were 88.46%, 92.00% and 0.7963 for presynaptic neurotoxins, and were 91.30%, 87.50% and 0.7963 for postsynaptic neurotoxins, respectively. In this study, we found that, the best Sn, Sp and CC were 100.0%, 92.86% and 0.9208 for presynaptic neurotoxins, and were 91.30%, 100.0%, and 0.9208 for postsynaptic neurotoxins, respectively. Based on these results, we can conclude that the prediction algorithms presented in this study had some advantage over the previous one.

With the increased number of toxins in the public dataset, it is indispensable to develop some reliable methods for classification of presynaptic and postsynaptic neurotoxins. In this study, ID, MNBC, RF, and IBK were applied to classify presynaptic and postsynaptic neurotoxins, a new promising feature representation method was presented by embedding PseAA compositions, MEME motif features, Prosite motif features and InterPro motif features to represent a protein sample. The MRMR feature selection method was also used to select 50 top ranked PseAA compositions to improve the predictive results. In order to obtain the best performance of the proposed algorithms, different kinds of motif features and PseAA compositions were combined and selected as the input parameters of four algorithms. The predictive results presented in this study clearly indicated: (1) MRMR feature selected method, complemented with motif features can significantly improve the prediction quality of neurotoxins; (2) using different parameters would make it possible for algorithms to perform better than the others. The best prediction results were obtained when using 50 PseAA compositions, 46 InterPro motif features and 6 MEME motif features as the input parameters of MNBC and RF. In summary, the above results indicated that ID, MNBC, RF and IBK by using 50 PseAA compositions and biological motif features as the input parameters were reliable for prediction of presynaptic and postsynaptic neurotoxins. We hope that the machine learning algorithms will provide some support for the identification of neurotoxins in the future. The proposed algorithms may become the useful tools in bridging the gap between the huge number of toxins in the public databases and the relatively less number of toxins that have been functionally characterized. As pointed out in Shen and Chou46 and demonstrated in a series of recent publications36, 37, 41, 4754, user-friendly and publicly accessible web-servers represent the future direction for developing practically more useful methods that will significantly enhance their impacts55, we shall make efforts in our future work to provide a web-server for the analysis method presented in this paper.

Methods

Datasets

The dataset generated by Yang and Li was used to estimate the effectiveness of the new prediction methods30. The protein sequences in this dataset were downloaded from the Animal Toxin Database (ATDB)28, 29. The PISCES56, 57 was used to cull the presynaptic and postsynaptic neurotoxin sequences where no two proteins in each dataset had more than 80% sequence identify. In the final dataset, presynaptic neurotoxin dataset consists of 78 protein sequences, and postsynaptic neurotoxin dataset consists of 69 protein sequences.

Machine learning approaches

In this study, Increment of Diversity (ID)58, Multinomial Naive Bayes Classifier (MNBC), Random Forest (RF), and K-nearest Neighbours Classifier (IBK) were used to classify the presynaptic and postsynaptic neurotoxins. The ID algorithm was implemented in the C++ software while the rest of the algorithms were implemented in the Weka package59.

Pseudo amino acid composition

It is very important to select a set of reasonable parameters for protein sequences prediction. As mentioned in previous works, pseudo amino acid composition (PseAAC) is a widely used approach for representation of protein sequences42, 44, 6071, and can be generated by a series powerful webservers developed recently. In this study, according to the concept of the Chou’s PseAA compositions7274, 400 dipeptide compositions were selected as the parameters of our approaches, which were defined in 400-dimension (400-D) space, formulated as:

Y:{y1,y2,y400} 1

where y i (i = 1, 2, 3 …… 400) was the absolute occurrence frequencies of 400 dipeptides.

Maximum Relevance Minimum Redundancy

In this study, MRMR34, 35 was applied on 400 PseAA compositions. After considering both the predictive accuracy and the MRMR score, the top 50 features were selected as the input parameters of the machine learning algorithms, which were defined in a 50-dimension (50-D) space, formulated as:

Z:{z1,z2,z3,z50} 2

MEME motif features

In this study, the presynaptic and postsynaptic neurotoxin datasets were uploaded to MEME software to conduct motif search31. The maximum motif number was set to 3 and the maximum motif length was set to 15. The logo format and the regular expression of these motifs were shown in Fig. 2. Six MEME motifs had been created which were corresponded to the presynaptic neurotoxins and postsynaptic neurotoxins, and the number of motif features was 6. Each element of the vectors represented the presence or absence of a motif in the protein sequences. That was, the corresponded feature value was 1 if a motif was presented; otherwise, it was 0. Consequently, each protein sequence was converted into a 6-dimension (6-D) space, formulated as:

M:{m1,m2,m6} 3

Prosite motif features

In this study, 11 kinds of Prosite motifs32 were found in 78 presynaptic neurotoxin sequences and 2 kinds of Prosite motifs were found in 69 postsynaptic neurotoxin sequences. The total number of motif features was 13. Consequently, each protein sequence was converted into a 13-dimension (13-D) space, formulated as:

P:{p1,p2,,p13} 4

InterPro motif features

InterPro is an integrated database of protein families, domains and functional sites33. In this study, 78 presynaptic neurotoxin sequences and 69 postsynaptic neurotoxin sequences were scanned by InterPro, and 46 functional motifs were found in the neurotoxin datasets. The total number of motif features was 46. Consequently, each protein sequence was converted into a 46-dimension (46-D) space, formulated as:

N:{n1,n2,,n46} 5

Features for prediction algorithms

In order to improve the prediction accuracy, 400 PseAA compositions, 50 PseAA compositions, 13 kinds of Prosite motifs, 6 kinds of MEME motifs and 46 InterPro motifs were combined. Because the Prosite motifs were contained in the InterPro motifs, so 13 Prosite motifs were not combined with 46 InterPro motifs. P1-P12 indicated 12 kinds of parameters, and these parameters were selected as the input parameters of ID, MNBC, RF, and IBK (Table 3).

Table 3.

Combination of dipeptide parameters and motif parameters.

Parameters Number Description of parameters
P1 400 400 dipeptides
P2 406 400 dipeptides and 6 kinds of MEME motifs
P3 413 400 dipeptides and 13 kinds of Prosite motifs
P4 419 400 dipeptides, 6 kinds of MEME motifs and 13 kinds of Prosite motifs
P5 446 400 dipeptides and 46 kinds of InterPro motifs
P6 452 400 dipeptides, 6 kinds of MEME motifs and 46 kinds of InterPro motifs
P7 50 50 dipeptides selected by MRMR
P8 56 50 dipeptides and 6 kinds of MEME motifs
P9 63 50 dipeptides and 13 kinds of Prosite motifs
P10 69 50 dipeptides, 13 kinds of Prosite motifs and 6 kinds of MEME motifs
P11 96 50 dipeptides and 46 kinds of InterPro motifs
P12 102 50 dipeptides, 46 kinds of InterPro motifs and 6 kinds of MEME motifs

Evaluation of methods

In this study, in order to roundly estimate the accuracy of our predictor, the sensitivity, specificity, correlation coefficient and overall accuracy were also calculated:

{Sn=TPTP+FNSp=TPTP+FPCC=(TP×TN)(FP×FN)(TP+FP)×(TN+FN)×(TP+FN)×(TN+FP)Acc=iTPiN 6

where TP denoted the numbers of the correctly recognized positives, FN denoted the number of the positives recognized as negatives, FP denoted the number of the negatives recognized as positives, TN denoted the numbers of correctly recognized negatives, N was the total number of protein sequences.

The set of metrics is valid only for the single-label systems. For the multi-label systems whose existence has become more frequent in system biology75 and system medicine40, 76, a completely different set of metrics as defined in work of Chou77 is needed. In order to take the advantage of using the Chou’s intuitive set of metrics for studying protein signal peptide cleavage site42, 43, 4749, 7882, the TP, TN, FP, and FN can be represented as follows:

{TP=N+N+TN=NN+FP=N+FN=N+ 7

Substituting Eq. (7) into Eq. (6), we can obtain the following metrics:

{Sn=1N+N+Sp=N+N+N+N++N+Acc=1N++N+N++NCC=1(N+N++N+N)(1+N+N+N+)(1+N+N+N) 8

where N + denoted the total numbers of the positives, N denoted the total numbers of the negatives, N+ denoted the number of the negatives incorrectly predicted as positives, and N+ denoted the number of the positives incorrectly predicted as negatives. In addition, the jackknife test was also used to validate the prediction power of our algorithms.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 31501078, No. 61561036, and No. 61602135), the Heilongjiang Postdoctoral Research Foundation (No. LBH-Z15153) and the China Postdoctoral Science Foundation (No. 2016M590290).

Author Contributions

L.Y., T.L., and Y.Z. conceived and designed the experiments. H.H. and L.Y. performed the experiments. L.Y. and H.H. analyzed the data. S.W., and Y.L. contributed materials/analysis tools. H.H. and L.Y. wrote the paper.

Competing Interests

The authors declare that they have no competing interests.

Footnotes

Haiyan Huo and Tao Li contributed equally to this work.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Yongchun Zuo, Email: yczuo@imu.edu.cn.

Lei Yang, Email: yanglei_hmu@163.com.

References

  • 1.Afifiyan F, et al. Four new postsynaptic neurotoxins from Naja naja sputatrix venom: cDNA cloning, protein expression, and phylogenetic analysis. Toxicon. 1998;36:1871–1885. doi: 10.1016/S0041-0101(98)00108-1. [DOI] [PubMed] [Google Scholar]
  • 2.Harris JB. Polypeptides from snake venoms which act on nerve and muscle. Prog. Med. Chem. 1984;21:63–110. doi: 10.1016/S0079-6468(08)70407-7. [DOI] [PubMed] [Google Scholar]
  • 3.Rossetto O, Rigoni M, Montecucco C. Different mechanism of blockade of neuroexocytosis by presynaptic neurotoxins. Toxicol. Lett. 2004;149:91–101. doi: 10.1016/j.toxlet.2003.12.023. [DOI] [PubMed] [Google Scholar]
  • 4.Hodgson WC, Dal Belo CA, Rowan EG. The neuromuscular activity of paradoxin: a presynaptic neurotoxin from the venom of the inland taipan (Oxyuranus microlepidotus) Neuropharmacology. 2007;52:1229–1236. doi: 10.1016/j.neuropharm.2007.01.002. [DOI] [PubMed] [Google Scholar]
  • 5.Hodgson WC, Wickramaratna JC. In vitro neuromuscular activity of snake venoms. Clin. Exp. Pharmacol. Physiol. 2002;29:807–814. doi: 10.1046/j.1440-1681.2002.03740.x. [DOI] [PubMed] [Google Scholar]
  • 6.Marcon F, Nicholson GM. Identification of presynaptic neurotoxin complexes in the venoms of three Australian copperheads (Austrelaps spp.) and the efficacy of tiger snake antivenom to prevent or reverse neurotoxicity. Toxicon. 2011;58:439–452. doi: 10.1016/j.toxicon.2011.08.003. [DOI] [PubMed] [Google Scholar]
  • 7.Montecucco C, Rossetto O. How do presynaptic PLA2 neurotoxins block nerve terminals. Trends Biochem. Sci. 2000;25:266–270. doi: 10.1016/S0968-0004(00)01556-5. [DOI] [PubMed] [Google Scholar]
  • 8.Montecucco C, et al. Different mechanisms of inhibition of nerve terminals by botulinum and snake presynaptic neurotoxins. Toxicon. 2009;54:561–564. doi: 10.1016/j.toxicon.2008.12.012. [DOI] [PubMed] [Google Scholar]
  • 9.Tang L, Zhou YC, Lin ZJ. Crystal structure of agkistrodotoxin, a phospholipase A2-type presynaptic neurotoxin from agkistrodon halys pallas. J. Mol. Biol. 1998;282:1–11. doi: 10.1006/jmbi.1998.1987. [DOI] [PubMed] [Google Scholar]
  • 10.Connolly S, et al. Neuromuscular effects of Papuan Taipan snake venom. Ann. Neurol. 1995;38:916–920. doi: 10.1002/ana.410380612. [DOI] [PubMed] [Google Scholar]
  • 11.Harris JB. Snake venoms in science and clinical medicine. 3. Neuropharmacological aspects of the activity of snake venoms. Trans. R. Soc. Trop. Med. Hyg. 1989;83:745–747. doi: 10.1016/0035-9203(89)90313-1. [DOI] [PubMed] [Google Scholar]
  • 12.Phui Yee JS, et al. Snake postsynaptic neurotoxins: gene structure, phylogeny and applications in research and therapy. Biochimie. 2004;86:137–149. doi: 10.1016/j.biochi.2003.11.012. [DOI] [PubMed] [Google Scholar]
  • 13.Jeyaseelan K, Poh SL, Nair R, Armugam A. Structurally conserved alpha-neurotoxin genes encode functionally diverse proteins in the venom of Naja sputatrix. FEBS Lett. 2003;553:333–341. doi: 10.1016/S0014-5793(03)01039-1. [DOI] [PubMed] [Google Scholar]
  • 14.Halpert J, Fohlman J, Eaker D. Amino acid sequence of a postsynaptic neurotoxin from the venom of the Australian tiger snake Notechis scutatus scutatus. Biochimie. 1979;61:719–723. doi: 10.1016/S0300-9084(79)80172-8. [DOI] [PubMed] [Google Scholar]
  • 15.Afifiyan F, Armugam A, Tan CH, Gopalakrishnakone P, Jeyaseelan K. Postsynaptic alpha-neurotoxin gene of the spitting cobra, Naja naja sputatrix: structure, organization, and phylogenetic analysis. Genome Res. 1999;9:259–266. [PMC free article] [PubMed] [Google Scholar]
  • 16.Gong N, Armugam A, Jeyaseelan K. Postsynaptic short-chain neurotoxins from Pseudonaja textilis. cDNA cloning, expression and protein characterization. Eur. J. Biochem. 1999;265:982–989. doi: 10.1046/j.1432-1327.1999.00800.x. [DOI] [PubMed] [Google Scholar]
  • 17.Tamiya T, Ohno S, Nishimura E, Fujimi TJ, Tsuchiya T. Complete nucleotide sequences of cDNAs encoding long chain alpha-neurotoxins from sea krait, Laticauda semifasciata. Toxicon. 1999;37:181–185. doi: 10.1016/S0041-0101(98)00181-0. [DOI] [PubMed] [Google Scholar]
  • 18.Rossetto, O. & Montecucco, C. Presynaptic neurotoxins with enzymatic activities. Handb. Exp. Pharmacol. 129–170 (2008). [DOI] [PubMed]
  • 19.Naamati G, Askenazi M, Linial M. ClanTox: a classifier of short animal toxins. Nucleic Acids Res. 2009;37:W363–W368. doi: 10.1093/nar/gkp299. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Naamati G, Askenazi M, Linial M. A predictor for toxin-like proteins exposes cell modulator candidates within viral genomes. Bioinformatics. 2010;26:i482–i488. doi: 10.1093/bioinformatics/btq375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Guang XM, Guo YZ, Wang X, Li ML. Prediction of neurotoxins by support vector machine based on multiple feature vectors. Interdiscip. Sci. 2010;2:241–246. doi: 10.1007/s12539-010-0044-7. [DOI] [PubMed] [Google Scholar]
  • 22.Saha S, Raghava GP. Prediction of neurotoxins based on their function and source. In Silico Biol. 2007;7:369–387. [PubMed] [Google Scholar]
  • 23.Saha S, Raghava GP. BTXpred: prediction of bacterial toxins. In Silico Biol. 2007;7:405–412. [PubMed] [Google Scholar]
  • 24.Mondal S, Bhavna R, Mohan Babu R, Ramakumar S. Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification. J. Theor. Biol. 2006;243:252–260. doi: 10.1016/j.jtbi.2006.06.014. [DOI] [PubMed] [Google Scholar]
  • 25.Lin H, Li QZ. Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant. Biochem. Biophys. Res. Commun. 2007;354:548–551. doi: 10.1016/j.bbrc.2007.01.011. [DOI] [PubMed] [Google Scholar]
  • 26.Yin JB, Fan YX, Shen HB. Conotoxin superfamily prediction using diffusion maps dimensionality reduction and subspace classifier. Curr. Protein Pept. Sci. 2011;12:580–588. doi: 10.2174/138920311796957702. [DOI] [PubMed] [Google Scholar]
  • 27.Fan YX, Song J, Shen HB, Kong X. PredCSF: an integrated feature-based approach for predicting conotoxin superfamily. Protein Pept. Lett. 2011;18:261–267. doi: 10.2174/092986611794578341. [DOI] [PubMed] [Google Scholar]
  • 28.He Q, et al. ATDB 2.0: A database integrated toxin-ion channel interaction data. Toxicon. 2010;56:644–647. doi: 10.1016/j.toxicon.2010.05.013. [DOI] [PubMed] [Google Scholar]
  • 29.He QY, et al. ATDB: a uni-database platform for animal toxins. Nucleic Acids Res. 2008;36:D293–D297. doi: 10.1093/nar/gkm832. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Yang L, Li Q. Prediction of presynaptic and postsynaptic neurotoxins by the increment of diversity. Toxicol. In Vitro. 2009;23:346–348. doi: 10.1016/j.tiv.2008.12.015. [DOI] [PubMed] [Google Scholar]
  • 31.Bailey TL, Williams N, Misleh C, Li WW. MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res. 2006;34:W369–W373. doi: 10.1093/nar/gkl198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Sigrist CJ, et al. PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res. 2010;38:D161–D166. doi: 10.1093/nar/gkp885. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Hunter S, et al. InterPro: the integrative protein signature database. Nucleic Acids Res. 2009;37:D211–D215. doi: 10.1093/nar/gkn785. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 2005;3:185–205. doi: 10.1142/S0219720005001004. [DOI] [PubMed] [Google Scholar]
  • 35.Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005;27:1226–1238. doi: 10.1109/TPAMI.2005.159. [DOI] [PubMed] [Google Scholar]
  • 36.Liu Z, et al. pRNAm-PC: Predicting N6-methyladenosine sites in RNA sequences via physical–chemical properties. Anal. Biochem. 2016;497:60–67. doi: 10.1016/j.ab.2015.12.017. [DOI] [PubMed] [Google Scholar]
  • 37.Chen W, Tang H, Ye J, Lin H, Chou KC. iRNA-PseU: Identifying RNA pseudouridine sites. Mol. Ther. Nucleic Acids. 2016;5:e332. doi: 10.1038/mtna.2016.37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Jia JH, Liu Z, Xiao X, Liu BX, Chou KC. pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. J. Theor. Biol. 2016;394:223–230. doi: 10.1016/j.jtbi.2016.01.020. [DOI] [PubMed] [Google Scholar]
  • 39.Liu B, Long R, Chou KC. iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework. Bioinformatics. 2016;32:2411–2418. doi: 10.1093/bioinformatics/btw186. [DOI] [PubMed] [Google Scholar]
  • 40.Cheng X, Zhao SG, Xiao X, Chou KC. iATC-mISF: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals. Bioinformatics. 2017;33:341–346. doi: 10.1093/bioinformatics/btx098. [DOI] [PubMed] [Google Scholar]
  • 41.Chen W, et al. iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences. Oncotarget. 2017;8:4208–4217. doi: 10.18632/oncotarget.13758. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Meher PK, Sahu TK, Saini V, Rao AR. Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou’s general PseAAC. Sci. Rep. 2017;7:42362. doi: 10.1038/srep42362. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Liu B, Wang SY, Long R, Chou KC. iRSpot-EL: identify recombination spots with an ensemble learning approach. Bioinformatics. 2017;33:35–41. doi: 10.1093/bioinformatics/btw539. [DOI] [PubMed] [Google Scholar]
  • 44.Chou KC. Some remarks on protein attribute prediction and pseudo amino acid composition. J. Theor. Biol. 2011;273:236–247. doi: 10.1016/j.jtbi.2010.12.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Tamura K, Dudley J, Nei M, Kumar S. MEGA4: molecular evolutionary genetics analysis (MEGA) software version 4.0. Mol. Biol. Evol. 2007;24:1596–1599. doi: 10.1093/molbev/msm092. [DOI] [PubMed] [Google Scholar]
  • 46.Chou KC, Shen HB. Rw: Recent advances in developing web-servers for predicting protein attributes. Nat. Sci. 2009;1:63–92. [Google Scholar]
  • 47.Chen W, Ding HFPM, Lin H, Chou K. C. iACP: a sequence-based tool for identifying anticancer peptides. Oncotarget. 2016;7:16895–16909. doi: 10.18632/oncotarget.7815. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Jia JH, Zhang LX, Liu Z, Xiao X, Chou KC. pSumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC. Bioinformatics. 2016;32:3133–3141. doi: 10.1093/bioinformatics/btw387. [DOI] [PubMed] [Google Scholar]
  • 49.Zhang CJ, et al. iOri-Human: identify human origin of replication by incorporating dinucleotide physicochemical properties into pseudo nucleotide composition. Oncotarget. 2016;7:69783–69793. doi: 10.18632/oncotarget.11975. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Jia JH, Liu Z, Xiao X, Liu BX, Chou KC. iCar-PseCp: identify carbonylation sites in proteins by Monte Carlo sampling and incorporating sequence coupled effects into general PseAAC. Oncotarget. 2016;7:34558–34570. doi: 10.18632/oncotarget.9148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Qiu WR, Sun BQ, Xiao X, Xu ZC, Chou KC. iHyd-PseCp: Identify hydroxyproline and hydroxylysine in proteins by incorporating sequence-coupled effects into general PseAAC. Oncotarget. 2016;7:44310–44321. doi: 10.18632/oncotarget.10027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Qiu WR, Xiao X, Xu ZC, Chou KC. iPhos-PseEn: Identifying phosphorylation sites in proteins by fusing different pseudo components into an ensemble classifier. Oncotarget. 2016;7:51270–51283. doi: 10.18632/oncotarget.9987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Xiao X, Ye HX, Liu Z, Jia JH, Chou KC. iROS-gPseKNC: Predicting replication origin sites in DNA by incorporating dinucleotide position-specific propensity into general pseudo nucleotide composition. Oncotarget. 2016;7:34180–34189. doi: 10.18632/oncotarget.9057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Liu B, Wu H, Zhang DY, Wang XL, Chou KC. Pse-Analysis: a python package for DNA/RNA and protein/peptide sequence analysis based on pseudo components and kernel methods. Oncotarget. 2017;8:13338–13343. doi: 10.18632/oncotarget.14524. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Chou KC. Impacts of bioinformatics to medicinal chemistry. Med. Chem. 2015;11:218–234. doi: 10.2174/1573406411666141229162834. [DOI] [PubMed] [Google Scholar]
  • 56.Wang G, Dunbrack RL., Jr. PISCES: recent improvements to a PDB sequence culling server. Nucleic Acids Res. 2005;33:W94–W98. doi: 10.1093/nar/gki402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Wang G, Dunbrack RL., Jr. PISCES: a protein sequence culling server. Bioinformatics. 2003;19:1589–1591. doi: 10.1093/bioinformatics/btg224. [DOI] [PubMed] [Google Scholar]
  • 58.Zhang L, Luo L. Splice site prediction with quadratic discriminant analysis using diversity measure. Nucleic Acids Res. 2003;31:6214–6220. doi: 10.1093/nar/gkg805. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Frank E, Hall M, Trigg L, Holmes G, Witten IH. Data mining in bioinformatics using Weka. Bioinformatics. 2004;20:2479–2481. doi: 10.1093/bioinformatics/bth261. [DOI] [PubMed] [Google Scholar]
  • 60.Chou KC. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Struct. Funct. Genet. 2001;43:246–255. doi: 10.1002/prot.1035. [DOI] [PubMed] [Google Scholar]
  • 61.Chou KC. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics. 2005;21:10–19. doi: 10.1093/bioinformatics/bth466. [DOI] [PubMed] [Google Scholar]
  • 62.Du PF, Gu SW, Jiao YS. PseAAC-General: Fast Building Various Modes of General Form of Chou’s Pseudo-Amino Acid Composition for Large-Scale Protein Datasets. Int. J. Mol. Sci. 2014;15:3495–3506. doi: 10.3390/ijms15033495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Liu B, et al. Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res. 2015;43:W65–W71. doi: 10.1093/nar/gkv458. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Nanni L, Brahnam S, Lumini A. Prediction of protein structure classes by incorporating different protein descriptors into general Chou’s pseudo amino acid composition. J. Theor. Biol. 2014;360:109–116. doi: 10.1016/j.jtbi.2014.07.003. [DOI] [PubMed] [Google Scholar]
  • 65.Sharma R, et al. Predict Gram-Positive and Gram-Negative Subcellular Localization via Incorporating Evolutionary Information and Physicochemical Features Into Chou’s General PseAAC. IEEE T. Nanobiosci. 2015;14:915–926. doi: 10.1109/TNB.2015.2500186. [DOI] [PubMed] [Google Scholar]
  • 66.Tahir M, Hayat M. iNuc-STNC: a sequence-based predictor for identification of nucleosome positioning in genomes by extending the concept of SAAC and Chou’s PseAAC. Mol. Biosyst. 2016;12:2587–2593. doi: 10.1039/C6MB00221H. [DOI] [PubMed] [Google Scholar]
  • 67.Rahimi M, Bakhtiarizadeh MR, Mohammadi-Sangcheshmeh A. OOgenesis_Pred: A sequence-based method for predicting oogenesis proteins by six different modes of Chou’s pseudo amino acid composition. J. Theor. Biol. 2017;414:128–136. doi: 10.1016/j.jtbi.2016.11.028. [DOI] [PubMed] [Google Scholar]
  • 68.Chou KC. Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Curr. Proteomics. 2009;6:262–274. doi: 10.2174/157016409789973707. [DOI] [Google Scholar]
  • 69.Zuo YC, et al. PseKRAAC: a flexible web server for generating pseudo K-tuple reduced amino acids composition. Bioinformatics. 2017;33:122–124. doi: 10.1093/bioinformatics/btw564. [DOI] [PubMed] [Google Scholar]
  • 70.Zuo YC, et al. iDPF-PseRAAAC: A Web-Server for Identifying the Defensin Peptide Family and Subfamily Using Pseudo Reduced Amino Acid Alphabet Composition. PLoS One. 2016;10:e0145541. doi: 10.1371/journal.pone.0145541. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Liu B, Wu H, Chou KC. Pse-in-One 2.0: An Improved Package of Web Servers for Generating Various Modes of Pseudo Components of DNA, RNA, and Protein Sequences. Natural Science. 2017;09:67–91. doi: 10.4236/ns.2017.94007. [DOI] [Google Scholar]
  • 72.Chou KC, Cai YD. Using functional domain composition and support vector machines for prediction of protein subcellular location. J. Biol. Chem. 2002;277:45765–45769. doi: 10.1074/jbc.M204161200. [DOI] [PubMed] [Google Scholar]
  • 73.Zuo YC, et al. Discrimination of membrane transporter protein types using K-nearest neighbor method derived from the similarity distance of total diversity measure. Mol Biosyst. 2015;11:950–957. doi: 10.1039/C4MB00681J. [DOI] [PubMed] [Google Scholar]
  • 74.Zuo YC, et al. Predicting peroxidase subcellular location by hybridizing different descriptors of Chou’ pseudo amino acid patterns. Anal. Biochem. 2014;458:14–19. doi: 10.1016/j.ab.2014.04.032. [DOI] [PubMed] [Google Scholar]
  • 75.Chou KC, Wu ZC, Xiao X. iLoc-Euk: A Multi-Label Classifier for Predicting the Subcellular Localization of Singleplex and Multiplex Eukaryotic Proteins. PLoS One. 2011;6:e18258. doi: 10.1371/journal.pone.0018258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Qiu WR, Sun BQ, Xiao X, Xu ZC, Chou KC. iPTM-mLys: identifying multiple lysine PTM sites and their different types. Bioinformatics. 2016;32:3116–3123. doi: 10.1093/bioinformatics/btw380. [DOI] [PubMed] [Google Scholar]
  • 77.Chou KC. Some remarks on predicting multi-label attributes in molecular biosystems. Mol. Biosyst. 2013;9:1092–1100. doi: 10.1039/c3mb25555g. [DOI] [PubMed] [Google Scholar]
  • 78.Chou KC. Prediction of protein signal sequences. Curr. Protein Pept. Sci. 2002;3:615–622. doi: 10.2174/1389203023380468. [DOI] [PubMed] [Google Scholar]
  • 79.Chen W, Feng PM, Lin H, Chou KC. iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res. 2013;41:e68–e68. doi: 10.1093/nar/gks1450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Chen JJ, Long R, Wang XL, Liu B, Chou KC. dRHP-PseRA: detecting remote homology proteins using profile-based pseudo protein sequence and rank aggregation. Sci. Rep. 2016;6:32333. doi: 10.1038/srep32333. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Chen W, Feng PM, Ding H, Lin H, Chou KC. Using deformation energy to analyze nucleosome positioning in genomes. Genomics. 2016;107:69–75. doi: 10.1016/j.ygeno.2015.12.005. [DOI] [PubMed] [Google Scholar]
  • 82.Liu B, Fang LY, Long R, Lan X, Chou KC. iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition. Bioinformatics. 2016;32:362–369. doi: 10.1093/bioinformatics/btv604. [DOI] [PubMed] [Google Scholar]

Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES