Abstract
Conotoxins are small disulfide-rich neurotoxic peptides, which can bind to ion channels with very high specificity and modulate their activities. Over the last few decades, conotoxins have been the drug candidates for treating chronic pain, epilepsy, spasticity, and cardiovascular diseases. According to their functions and targets, conotoxins are generally categorized into three types: potassium-channel type, sodium-channel type, and calcium-channel types. With the avalanche of peptide sequences generated in the postgenomic age, it is urgent and challenging to develop an automated method for rapidly and accurately identifying the types of conotoxins based on their sequence information alone. To address this challenge, a new predictor, called iCTX-Type, was developed by incorporating the dipeptide occurrence frequencies of a conotoxin sequence into a 400-D (dimensional) general pseudoamino acid composition, followed by the feature optimization procedure to reduce the sample representation from 400-D to 50-D vector. The overall success rate achieved by iCTX-Type via a rigorous cross-validation was over 91%, outperforming its counterpart (RBF network). Besides, iCTX-Type is so far the only predictor in this area with its web-server available, and hence is particularly useful for most experimental scientists to get their desired results without the need to follow the complicated mathematics involved.
1. Introduction
Being peptides consisting of about 10 to 30 amino acid residues, conotoxins are toxins secreted by cone snails for capturing prey and securing themselves. This kind of toxins can bind to various targets, such as G protein-coupled receptors (GPCRs), nicotinic acetylcholine, and neurotensin receptors. In particular, they display extremely high specificity and affinity for ion channels. Ion channels represent a class of membrane spanning protein pores that mediate the flux of ions in a variety of cell types. There are over 300 types of ion channels in a living cell [1]. Many crucial functions in life, such as heartbeat, sensory transduction, and central nervous system response, are controlled by cell signaling via various ion channels. Ion channel dysfunction may lead to a number of diseases, such as epilepsy, arrhythmia, and type II diabetes. These kinds of diseases are primarily treated with the drugs that modulate the ion channels concerned. Ion channels are also the important targets for treating virus diseases (see, e.g., [2–4]). Owing to their importance to human being's life, ion channels have become the 2nd most frequent targets for drug development, just next to GPCRs (G protein-coupled receptors) [5]. The following three kinds of ion channels are usually the targets by conotoxins: potassium (K) channel (Figure 1), sodium (Na) channel (Figure 2), and calcium (Ca) channel (Figure 3). Based on their functions and targeting objects, conotoxins can be classified into the following three types: (i) K-channel-targeting type; (ii) Na-channel-targeting type; and (iii) Ca-channel-targeting type.
Figure 1.

A ribbon drawing to show the human potassium (K) channel. Reproduced from Chou [6] with permission.
Figure 2.

A ribbon drawing to show the human sodium (Na) channel. Reproduced from Chou [6] with permission.
Figure 3.

A ribbon drawing to show the calcium (Ca) channel from hepatitis C virus. Reproduced from [4] with permission.
Although conotoxins are lethally venomous because of blocking the transmission of nerve impulses, they have been widely used to treat chronic pain, epilepsy, spasticity, and cardiovascular diseases. Therefore, conotoxins have been regarded as important pharmacological tools for neuroscience research.
It has been estimated that there are more than 100,000 kinds of conotoxins secreted by over 700 kinds of Conus in the world [8]. However, relatively much fewer conotoxins (about 3,000 peptides) have been experimentally confirmed and reported in literature and databases. Moreover, the records about the functions of conotoxins in public databases are no more than 300 items. Hence, developing a computational method to predict the functions of conotoxins has become a challenging task.
In a pioneer work, Mondal et al. [9] proposed a method for predicting conotoxin superfamilies by using the pseudoamino acid composition approach [10, 11]. Subsequently, a series of studies have been reported in predicting conotoxin superfamilies (see, for example, [12–15]). All these methods yielded quite encouraging results, and each of them did play a role in stimulating the development of this area. However, none of these methods can be used to predict the types of conotoxins defined according to their targeting ion-channels. For instance, both delta-conotoxin-like Ac6.1 (UniProt accession number: P0C8V5) [16] and omega-conotoxin-like Ai6.2 [17] (UniProt accession number: P0CB10) belong to the conotoxin O1 superfamily. However, the former targets the voltage-gated sodium channels, while the latter targets the voltage-gated calcium channels.
To deal with this problem, recently, a method was developed [7] to identify conotoxins among the aforementioned three types by using their sequence information alone. However, further work is needed in this regard due to the following reasons. (i) The prediction quality can be further improved. (ii) No web server for the prediction method in [7] was provided, and hence its usage is quite limited, especially for the majority of experimental scientists.
The present study was devoted to develop a new predictor for identifying the conotoxins' types from the above two aspects.
As elaborated in a comprehensive review [18] and conducted by a series of recent publications [19–28], to establish a really useful statistical predictor for a biological system, we need to consider the following procedures: (i) construct or select a valid benchmark dataset to train and test the predictor; (ii) formulate the biological samples with an effective mathematical expression that can truly reflect their intrinsic correlation with the target to be predicted; (iii) introduce or develop a powerful algorithm (or engine) to operate the prediction; (iv) properly perform cross-validation tests to objectively evaluate the anticipated accuracy of the predictor; (v) establish a user-friendly web server for the predictor that is accessible to the public. In what follows, let us describe how to deal with these procedures one by one.
2. Materials and Methods
2.1. Benchmark Dataset
The sequences of conotoxins and their functions were collected from the UniProt [29]. To ensure its quality, the benchmark dataset was constructed strictly according to the following criteria. (i) Included were only those peptides annotated with “conotoxin” and with the keyword of potassium, calcium, or sodium in their functional ontologies. (ii) Included were only those conotoxins with clear functional annotations based on experiment results. In other words, we excluded those annotated with “uncertain,” “predicted,” or “inferred from homology” because of lacking confidence. (iii) Excluded were those that were annotated with “immature” due to the incompleteness. (iv) Excluded were also those that contained any invalid amino acid codes, such as “B,” “X,” and “Z”. After going through the above procedures, we obtained 195 conotoxins, of which 37 belonged to the K-channel-targeting type, 86 to the Na-channel-targeting type, and 72 to the Ca-channel-targeting type.
As elaborated in a comprehensive review [18], a benchmark dataset containing many redundant samples with high similarity would lack statistical representativeness. A predictor, if trained and tested by a benchmark dataset with many homologous sequences, might yield misleading results with overestimated accuracy [30]. To remove the homologous sequences from the benchmark dataset, a cutoff threshold of 25% was recommended [31] to exclude those protein/peptide sequences from the benchmark datasets that had ≥25% pairwise sequence identity to any other sample in the same subset. However, in this study we did not use such a stringent criterion because the currently available data did not allow us to do so. Otherwise, the numbers of peptides for some subsets would be very few to have statistical significance. As a compromise, we set the cutoff threshold at 80% and used the CD-HIT software [32] to remove those conotoxin samples that had ≥80% sequence identity to any other in a same subset. After such a screening procedure, we obtained 112 conotoxin samples for the benchmark dataset S, as formulated as follows:
| (1) |
where the subset S K contains 24 conotoxin samples of K-channel-targeting type, S Na contains 43 samples of Na-channel-targeting type, and S Ca contains 45 samples of Ca-channel-targeting type, while the symbol ∪ represents the union in the set theory. The codes of 112 conotoxins and their sequences are given in Supporting Information S1 (see Supplementary Material available online at http://dx.doi.org/10.1155/2014/286419).
Likewise, we also constructed an independent dataset S Ind as formulated by
| (2) |
where S K Ind contains 12 K-conotoxins, S Na Ind contains 37 Na-conotoxins, and S Ca Ind contains 21 Ca-conotoxins. None of the samples in the independent dataset occurs in the dataset S of (1), and their detailed sequences are given in Supporting Information S2.
For simplicity, hereafter, let us use “K-conotoxin,” “Na-conotoxin,” and “Ca-conotoxin” to represent K-channel-targeting type conotoxin, Na-channel-targeting type conotoxin, and Ca-channel-targeting type conotoxin, respectively.
2.2. The Dipeptide Mode of Pseudoamino Acid Composition
Given a conotoxin peptide P with L amino acids, how do we translate it into a mathematical expression for statistical prediction? This is one of the first important problems to develop a sequence-based predictor for identifying the type of a conotoxin. The most straightforward way to formulate the sample of a conotoxin peptide P with L residues is to use its entire amino acid sequence, as can be formulated by
| (3) |
where R1 represents the 1st residue of the conotoxin peptide and R2 the 2nd residue of the peptide and so forth. Subsequently, we can utilize various sequence similarity search based tools, such as BLAST [33], to perform statistical prediction. Although this kind of sequence model was very straightforward and intuitive, unfortunately, it failed to work when a query conotoxin peptide did not have significant similarity to any of the peptide sequences in the training dataset. Thus, investigators turned to use vectors to represent the peptide samples. Another reason for them to do so is that the statistical samples in vector format are much easier to be handled than in sequence format by many existing operation engines, such as the correlation angle approach [34], covariance discriminant (CD) [27, 35–37], neural network [38–40], optimization approach [41], support vector machine (SVM) [22, 23, 42, 43], random forest [44, 45], conditional random field [20], nearest neighbor (NN) [46, 47]; K-nearest neighbor (KNN) [30], OET-KNN [48–50], fuzzy K-nearest neighbor [25, 51–55], ML-KNN algorithm [56], and SLLE algorithm [36].
The simplest vector used to represent a peptide or protein sample is its amino acid composition (AAC), as given as follows:
| (4) |
where f i (i = 1,2,…, 20) is the normalized occurrence frequency of the ith type of native amino acid in the peptide chain and T is the transpose operator. The AAC model was used by many in predicting various contributes of proteins (see, e.g., [41, 57–59]). However, as we can see from (4), when using AAC to represent a peptide or protein sample, all its sequence order information would be completely lost and hence limit the prediction quality.
How can we formulate a peptide or protein sequence with a vector yet still keep considerable sequence order information? As reported in many recent publications, in order to incorporate the sequence order information, the pseudoamino acid composition [10, 11] or Chou's PseAAC [60] was proposed. Since the concept of PseAAC was proposed in 2001 [10], it has been penetrating into almost all the fields of protein attribute predictions (see, e.g., [61–78]). Recently, the concept of PseAAC was further extended to represent the feature vectors of DNA and nucleotides [19, 21, 23, 27, 79], as well as other biological samples (see, e.g., [80–82]). Because it has been widely and increasingly used, in addition to the web server “PseAAC” [83] built in 2008, recently three types of powerful open access software, called “PseAAC-Builder” [84], “propy” [85], and “PseAAC-General” [86], were established: the former two are for generating various modes of Chou's special PseAAC, while the 3rd one is for those of Chou's general PseAAC.
According to a comprehensive review [18], the general PseAAC is formulated by
| (5) |
where the component ψ u (u = 1,2,…, Ω) and the dimension Ω will depend on how to extract the features from the peptide sequences concerned. For the current study, since the conotoxin sequences are not long (about 10–30 residues), we could just consider the sequence order information between two most contiguous amino acid residues. Thus, the dimension of the vector P in (5) is Ω = 20 × 20 = 400 and each of the components therein is given by
| (6) |
where A, C,…, W, Y are, respectively, the single letter codes of 20 native amino acids, f(AA) is the occurrence frequency for the dipeptide AA in the conotoxin sequence (see (3)), and f(AC) is for the dipeptide AC and so forth. The formulation defined by (5)-(6) is actually the dipeptide mode of PseAAC, which can be automatically generated by the PseAAC server [83] for a given peptide or protein sequence.
2.3. Feature Selection
The original raw features usually contain the redundant information and noise that may negatively affect the prediction quality [87]. Using the feature selection techniques to optimize the feature set can not only enhance the prediction accuracy but also provide useful insights for in-depth understanding of the action mechanism of conotoxins. According to the feature selection algorithm [87], the F-score function is defined by
| (7) |
where is the average frequency of the ith feature in the kth dataset, the average frequency of the ith feature in the all datasets concerned, f ij k is the frequencies of the ith feature of the jth sequence in the kth dataset, and N k is the number of peptide samples in the kth dataset. The program called “fselect.py” was downloaded from http://www.csie.ntu.edu.tw/~cjlin/libsvmtools to calculate F-score defined in (7).
The larger the F-score is, the more likely it has a better discriminative capability [87]. Accordingly, we ranked the 400 dipeptides in (5) according to their F-scores. Subsequently, based on the ranked dipeptides, we performed the incremental feature selection (IFS) strategy to find an optimal subset of features that yielded the highest predictive accuracy. During the IFS procedure, the feature subset started with one feature with the highest F-score. A new feature subset was composed when one more feature with the second highest F-score was added. By adding these features sequentially from the higher to lower ranks, 400 feature sets would be obtained. The τth feature set can be formulated as
| (8) |
For each of the 400 feature sets, a prediction model based on the proposed predictive algorithm was constructed and examined with the jackknife cross-validation on the benchmark dataset. By doing so, we obtained an IFS curve in a 2D (dimensional) Cartesian coordinate system with index τ as the abscissa (or X-coordinate) and the overall accuracy as the ordinate (or Y-coordinate). The optimal feature set is expressed as
| (9) |
with which the IFS curve reached its peak. In other words, in the 2D coordinate system, when X = Θ, the value of the overall accuracy was the maximum. Thus, we used the Θ features to build the final predictor.
2.4. Support Vector Machine (SVM)
The classification algorithm used in this work was the support vector machine (SVM). The SVM has been widely used in the realm of bioinformatics (see, e.g., [19, 22, 23, 88–90]). Its basic principle is to transform the input vector into a high-dimension Hilbert space and seek a separating hyperplane with the maximal margin in this space by using the decision function:
| (10) |
where is the ith training vector, the y i represents the type of the ith training vector, and is a kernel function which defines an inner product in a high dimensional feature space. Because of its effectiveness and speed in nonlinear classification process, the radial basis kernel function (RBF) was used in the current work. The original SVM was designed for two-class problems. For multiclass problems, several strategies such as one-versus-rest (OVR), one-versus-one (OVO), and DAGSVM have been applied to extend the traditional SVM. In the present study, we used the OVO strategy for multiclass prediction. The concrete SVM software (LibSVM) was downloaded from http://www.csie.ntu.edu.tw/~cjlin/libsvm. A grid search method was used to optimize the regularization parameter C and kernel parameter via the jackknife cross-validation. The search spaces for C and γ are [215, 2−5] and [2−5, 2−15] with steps of 2−1 and 2, respectively. For more details about SVM, see a monograph [91].
3. Results and Discussion
3.1. Test Method and Criteria
In statistical prediction, the independent dataset test, subsampling or K-fold crossover test and jackknife test are the three cross-validation methods often used to check a predictor for its accuracy [92]. However, among the three test methods, the jackknife test is deemed the least arbitrary that can always yield a unique result for a given benchmark dataset [18]. Accordingly, the jackknife test has been increasingly used and widely recognized by investigators to examine the quality of various predictors (see, e.g., [19, 21, 73, 75, 93–95]). Therefore, in this study we also adopted the jackknife test.
In addition to an objective test method, we also need a set of metrics to reasonably measure the test outcome. Here, let us use the criterion proposed in [96, 97] to develop a set of more intuitive and easier-to-understand metrics; that is, the correct rates ΛK in predicting K-conotoxins, ΛNa in predicting Na-conotoxins, and ΛCa in predicting Ca-conotoxins are defined by
| (11) |
where N K is the total number of the K-conotoxins investigated, while N Na K is the number of the K-conotoxins incorrectly predicted as the Na-conotoxins, and N Ca K is the number of the K-conotoxins incorrectly predicted as the Ca-conotoxins; N Na is the total number of the Na-conotoxins investigated, while N K Na is the number of the Na-conotoxins incorrectly predicted as the K-conotoxins and N Ca Na is the number of the Na-conotoxins incorrectly predicted as the Ca-conotoxins; and N Ca is the total number of the Ca-conotoxins investigated, while N Na Ca is the number of the Ca-conotoxins incorrectly predicted as the Na-conotoxins and N K Ca is the number of the Ca-conotoxins incorrectly predicted as the K-conotoxins. From (11), it follows that
| (12) |
where OA stands for the overall accuracy and AA for the average accuracy.
3.2. The Optimal Features
As mentioned above, it would be no good for a sample vector to contain either too few or too many features. This is because the former would limit the prediction quality due to lack of information, while the latter would generate a lot of noise due to redundancy. Therefore, we should find a set of optimal features, for which there is minimal redundancy among themselves but maximal relevancy to the target to be predicted. In the present study, such an optimal feature-set is none but (9).
Shown in Figure 4 is the IFS curve for the value of OA against the number of the counted features, as described in Section 2.3. As can be seen from there, the value of OA reached its peak of 91.1% when the top-ranked 50 dipeptides (Table 1) were taken into account.
Figure 4.

A plot to show the IFS curve, where the abscissa and ordinate axis denote the number of features and the overall accuracy, respectively. As shown in the figure, the value of the overall accuracy reached its peak (91.1%) when the top-ranked 50 dipeptide features were taken into account.
Table 1.
List of the 50 optimal features or dipeptides derived according to (7)–(9) as elaborated in the Section 2.3.
| AA | AS | CC | CH | CS | DH | DN | EN | GA | GH |
| GL | GT | GY | HA | HL | HS | IY | KD | KK | KM |
| KP | LN | LV | MC | MY | ND | NQ | NS | PI | QK |
| QT | RC | RD | RF | RN | RT | RW | SC | SG | TE |
| TF | TT | VV | WG | WI | YD | YH | YL | YT | YY |
The predictor thus obtained via the aforementioned procedures is called “iCTX-Type,” where “i” stands for “identify” and “CTX” for “conotoxin.”
A comparison of the current predictor iCTX-Type with the one in [7] (i.e., to the best of our knowledge, it is the only existing predictor in this area) is given in Table 2, from which we can see the following. (i) For four of the five metrics defined in (10)-(11), iCTX-Type yielded higher scores than the method in [7]. Particularly, iCTX-Type achieved higher overall accuracy (OA) and average accuracy (AA). (ii) Compared with the method of [7] using 70 features, only 50 features were used in the present method (Table 1), indicating that the iCTX-Type is more efficient in excluding redundancy and noise as well as in capturing the core features.
Table 2.
Comparison of the current method with the one in [7] by the jackknife test on the same benchmark dataset (Supporting Information S1) according to the metrics defined in (11)-(12).
| Method | Number of features counted | ΛK (%) | ΛNa (%) | ΛCa (%) | AA (%) | OA (%) |
|---|---|---|---|---|---|---|
| RBF networka | 70 | 91.7 | 88.4 | 88.9 | 89.7 | 89.3 |
| iCTX-Typeb | 50 | 83.3 | 97.8 | 89.8 | 90.3 | 91.1 |
aSee [7].
bThis paper.
To further verify the performance of the current predictor, iCTX-Type was also used to identify the samples in the independent dataset S Ind (see Supporting Information S2), and the success rates (see (11)) thus obtained were 91.7%, 91.9%, and 90.5% for K-, Na-, and Ca-conotoxins, respectively. These results are fully consistent with those obtained by the jackknife test as given in Table 2, furtherindicating that the new predictor iCTX-Type is quite promising and holds a high potential to become a useful tool for in-depth studying ion channel-targeted conotoxins.
To enhance the value of its practical applications [98], a web server for the new iCTX-Type predictor was established as described below.
3.3. Web-Server Guide
For the convenience of the vast majority of experimental scientists, below a step-by-step guide is provided for how to use the web server to get the desired results without the need to follow the mathematic equations that were presented in this paper just for the integrity in developing the predictor.
Step 1. Open the web server at http://lin.uestc.edu.cn/server/iCTX-Type and you will see the top page of iCTX-Type on your computer screen, as shown in Figure 5. Click on the Read Me button to see a brief introduction about the predictor and the caveat when using it.
Figure 5.

A screenshot to show the top page of the iCTX-Type web server. Its website address is http://lin.uestc.edu.cn/server/iCTX-Type.
Step 2. Either type or copy/paste the query peptide sequences into the input box at the center of Figure 5. The input sequence should be in the FASTA format. A sequence in FASTA format consists of a single initial line beginning with a greater-than symbol “>” in the first column, followed by lines of sequence data. The words right after the “>” symbol in the single initial line are optional and only used for the purpose of identification and description. All lines should be no longer than 120 characters and usually do not exceed 80 characters. The sequence ends if another line starting with a “>” appears; this indicates the start of another sample sequence. Example sequences in FASTA format can be seen by clicking on the Example button right above the input box.
Step 3. Click on the Submit button to see the predicted result. For instance, when using the three peptide sequences as an input and clicking the Submit button, you will see the following shown on the screen of your computer: the outcome for the 1st query example is “Ca-conotoxin”; the outcome for the 2nd query sample is “K-conotoxin”; the outcome for the 3rd query sample is “Na-conotoxin.” All these results are fully consistent with the experimental observations. It takes only a few seconds for the above computation before the predicted result appears on your computer screen; the more number of query sequences, the longer time it usually needs.
Step 4. Click on the Data button to download the benchmark datasets used to train and test the iCTX-Type predictor.
Step 5. Click on the Citation button to find the relevant papers that document the detailed development and algorithm of iCTX-Type.
Caveats. The input query sequences must be formed by the single-letter codes of the 20 native amino acids; any other characters such as “B,” “X,” “U,” and “Z” are invalid and should not be part of the peptide sequence.
4. Conclusion
It is anticipated that iCTX-Type may become a useful high throughput tool for both basic research and drug development, particularly for in-depth investigation into the mechanisms of ion-channels and developing new drugs to treat chronic pain, epilepsy, spasticity, and cardiovascular diseases, among others.
It is instructive to point out that since the binding of conotoxins to ion-channel is highly selective and specific, the information obtained by iCTX-Type in identifying the types of conotoxins may be also very useful for designing ion channel inhibitors according to the Chou's distorted key theory as elaborated in [99] and briefed in a Wikipedia article at http://en.wikipedia.org/wiki/Chou's_distorted_key_theory_for_peptide_drugs.
Supplementary Material
Supporting Information S1: The benchmark dataset 𝕊 contains 112 conotoxins, of which 24 belong to K-channel-targeting type, 43 to Na-channel-targeting type, and 45 to Ca-channel-targeting type.
Supporting Information S2: The independent dataset 𝕊 Ind contains 70 conotoxins, of which 12 are of K-channel-targeting type, 37 of Na-channel-targeting type, and 21 of Ca-channel-targeting type. None of the samples listed here occurs in benchmark dataset 𝕊.
Acknowledgments
The authors wish to thank the anonymous reviewers for their constructive comments, which were very helpful for strengthening the presentation of this study. This work was supported by the National Nature Scientific Foundation of China (nos. 61202256, 61301260, and 61100092), the Nature Scientific Foundation of Hebei Province (no. C2013209105), and the Fundamental Research Funds for the Central Universities (nos. ZYGX2012J113 and ZYGX2013J102).
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
References
- 1.Gabashvili IS, Sokolowski BH, Morton CC, Giersch AB. Ion channel gene expression in the inner ear. Journal of the Association for Research in Otolaryngology. 2007;8(3):305–328. doi: 10.1007/s10162-007-0082-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Schnell JR, Chou JJ. Structure and mechanism of the M2 proton channel of influenza A virus. Nature. 2008;451(7178):591–595. doi: 10.1038/nature06531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Pielak RM, Schnell JR, Chou JJ. Mechanism of drug inhibition and drug resistance of influenza A M2 channel. Proceedings of the National Academy of Sciences of the United States of America. 2009;106(18):7379–7384. doi: 10.1073/pnas.0902548106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.OuYang B, Xie S, Berardi MJ, et al. Unusual architecture of the p7 channel from hepatitis C virus. Nature. 2013;498(7455):521–525. doi: 10.1038/nature12283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Xiao X, Min JL, Wang P. Predict drug-protein interaction in cellular networking. Current Topics in Medicinal Chemistry. 2013;13(14):1707–1712. doi: 10.2174/15680266113139990121. [DOI] [PubMed] [Google Scholar]
- 6.Chou K-C. Insights from modeling three-dimensional structures of the human potassium and sodium channels. Journal of Proteome Research. 2004;3(4):856–861. doi: 10.1021/pr049931q. [DOI] [PubMed] [Google Scholar]
- 7.Yuan LF, Ding C, Guo SH, Ding H, Chen W, Lin H. Prediction of the types of ion channel-targeted conotoxins based on radial basis function network. Toxicology in Vitro. 2013;27(2):852–856. doi: 10.1016/j.tiv.2012.12.024. [DOI] [PubMed] [Google Scholar]
- 8.Daly NL, Craik DJ. Structural studies of conotoxins. IUBMB Life. 2009;61(2):144–150. doi: 10.1002/iub.158. [DOI] [PubMed] [Google Scholar]
- 9.Mondal S, Bhavna R, Mohan Babu R, Ramakumar S. Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification. Journal of Theoretical Biology. 2006;243(2):252–260. doi: 10.1016/j.jtbi.2006.06.014. [DOI] [PubMed] [Google Scholar]
- 10.Chou K-C. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins. 2001;43(3):246–255. doi: 10.1002/prot.1035. Erratum in: Proteins, vol. 44, no. 1, article 60, 2001. [DOI] [PubMed] [Google Scholar]
- 11.Chou KC. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics. 2005;21(1):10–19. doi: 10.1093/bioinformatics/bth466. [DOI] [PubMed] [Google Scholar]
- 12.Lin H, Li QZ. Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant. Biochemical and Biophysical Research Communications. 2007;354(2):548–551. doi: 10.1016/j.bbrc.2007.01.011. [DOI] [PubMed] [Google Scholar]
- 13.Yin JB, Fan YX, Shen HB. Conotoxin superfamily prediction using diffusion maps dimensionality reduction and subspace classifier. Current Protein and Peptide Science. 2011;12(6):580–588. doi: 10.2174/138920311796957702. [DOI] [PubMed] [Google Scholar]
- 14.Laht S, Koua D, Kaplinski L, Lisacek F, Stöcklin R, Remm M. Identification and classification of conopeptides using profile hidden Markov Models. Biochimica et Biophysica Acta. 2012;1824(3):488–492. doi: 10.1016/j.bbapap.2011.12.004. [DOI] [PubMed] [Google Scholar]
- 15.Koua D, Laht S, Kaplinski L, et al. Position-specific scoring matrix and hidden Markov model complement each other for the prediction of conopeptide superfamilies. Biochimica et Biophysica Acta. 2013;1834(4):717–724. doi: 10.1016/j.bbapap.2012.12.015. [DOI] [PubMed] [Google Scholar]
- 16.Gowd KH, Dewan KK, Iengar P, Krishnan KS, Balaram P. Probing peptide libraries from Conus achatinus using mass spectrometry and cDNA sequencing: identification of δ and ω-conotoxins. Journal of Mass Spectrometry. 2008;43(6):791–805. doi: 10.1002/jms.1377. [DOI] [PubMed] [Google Scholar]
- 17.Hillyard DR, Mcintosh MJ, Jones RM, et al. O-superfamily conotoxin peptides. Patent number JP2003533178, 2008.
- 18.Chou K-C. Some remarks on protein attribute prediction and pseudo amino acid composition. Journal of Theoretical Biology. 2011;273(1):236–247. doi: 10.1016/j.jtbi.2010.12.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Guo SH, Deng EZ, Xu LQ, Ding H, Lin H, Chen W. iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics. 2014 doi: 10.1093/bioinformatics/btu083. [DOI] [PubMed] [Google Scholar]
- 20.Xu Y, Ding J, Wu LY. iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition. PLoS ONE. 2013;8(2) doi: 10.1371/journal.pone.0055844.e55844 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Qiu WR, Xiao X. iRSpot-TNCPseAAC: identify recombination spots with trinucleotide composition and pseudo amino acid components. International Journal of Molecular Sciences. 2014;15(2):1746–1766. doi: 10.3390/ijms15021746. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Liu B, Zhang D, Xu R, et al. Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics. 2014;30(4):472–479. doi: 10.1093/bioinformatics/btt709. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Chen W, Feng PM, Lin H, Chou KC. iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Research. 2013;41(6, article e68) doi: 10.1093/nar/gks1450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Xiao JL, Min X, Chou K-C. iEzy-Drug: a web server for identifying the interaction between enzymes and drugs in cellular networking. BioMed Research International. 2013;2013:13 pages. doi: 10.1155/2013/701317.701317 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Xiao X, Min JL, Wang P. iCDI-PseFpt: identify the channel-drug interaction in cellular networking with PseAAC and molecular fingerprints. Journal of Theoretical Biology C. 2013;337:71–79. doi: 10.1016/j.jtbi.2013.08.013. [DOI] [PubMed] [Google Scholar]
- 26.Xu Y, Shao XJ, Wu LY, Deng NY, Chou KC. iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins. PeerJ. 2013;1, article e171 doi: 10.7717/peerj.171. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Chen W, Lin H, Feng PM, Ding C, Zuo YC. iNuc-PhysChem: a sequence-based predictor for identifying nucleosomes via physicochemical properties. PLoS ONE. 2012;7(10) doi: 10.1371/journal.pone.0047843.e47843 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Xu Y, Wen X, Shao XJ, Deng NY. iHyd-PseAAC: predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition. International Journal of Molecular Sciences. 2014;15(5):7594–7610. doi: 10.3390/ijms15057594. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Consortium TU. Reorganizing the protein space at the Universal Protein Resource (UniProt) Nucleic Acids Research. 2012;40:D71–D75. doi: 10.1093/nar/gkr981. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Chou KC, Shen HB. Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiers. Journal of Proteome Research. 2006;5(8):1888–1897. doi: 10.1021/pr060167c. [DOI] [PubMed] [Google Scholar]
- 31.Chou KC, Shen HB. Recent progress in protein subcellular location prediction. Analytical Biochemistry. 2007;370(1):1–16. doi: 10.1016/j.ab.2007.07.006. [DOI] [PubMed] [Google Scholar]
- 32.Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28(23):3150–3152. doi: 10.1093/bioinformatics/bts565. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Wootton JC, Federhen S. Statistics of local complexity in amino acid sequences and sequence databases. Computers and Chemistry. 1993;17(2):149–163. [Google Scholar]
- 34.Chou JJ. A formulation for correlating properties of peptides and its application to predicting human immunodeficiency virus protease-cleavable sites in proteins. Biopolymers. 1993;33(9):1405–1414. doi: 10.1002/bip.360330910. [DOI] [PubMed] [Google Scholar]
- 35.Chou KC. Prediction of G-protein-coupled receptor classes. Journal of Proteome Research. 2005;4(4):1413–1418. doi: 10.1021/pr050087t. [DOI] [PubMed] [Google Scholar]
- 36.Wang M, Yang J, Xu ZJ, Chou KC. SLLE for predicting membrane protein types. Journal of Theoretical Biology. 2005;232(1):7–15. doi: 10.1016/j.jtbi.2004.07.023. [DOI] [PubMed] [Google Scholar]
- 37.Xiao X, Wang P, Chou K-C. Predicting protein structural classes with pseudo amino acid composition: an approach using geometric moments of cellular automaton image. Journal of Theoretical Biology. 2008;254(3):691–696. doi: 10.1016/j.jtbi.2008.06.016. [DOI] [PubMed] [Google Scholar]
- 38.Feng KY, Cai YD, Chou KC. Boosting classifier for predicting protein domain structural class. Biochemical and Biophysical Research Communications. 2005;334(1):213–217. doi: 10.1016/j.bbrc.2005.06.075. [DOI] [PubMed] [Google Scholar]
- 39.Cai YD, Chou KC. Artificial neural network model for predicting α-turn types. Analytical Biochemistry. 1999;268(2):407–409. doi: 10.1006/abio.1998.2992. [DOI] [PubMed] [Google Scholar]
- 40.Thompson TB, Zheng C, Chou K-C. Neural network prediction of the HIV-1 protease cleavage sites. Journal of Theoretical Biology. 1995;177(4):369–379. doi: 10.1006/jtbi.1995.0254. [DOI] [PubMed] [Google Scholar]
- 41.Zhang CT, Chou KC. An optimization approach to predicting protein structural class from amino acid composition. Protein Science. 1992;1(3):401–408. doi: 10.1002/pro.5560010312. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Feng PM, Chen W, Lin H. iHSP-PseRAAAC: identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Analytical Biochemistry. 2013;442(1):118–125. doi: 10.1016/j.ab.2013.05.024. [DOI] [PubMed] [Google Scholar]
- 43.Xiao X, Wang P, Chou KC. iNR-physchem: a sequence-based predictor for identifying nuclear receptors and their subfamilies via physical-chemical property matrix. PLoS ONE. 2012;7(2) doi: 10.1371/journal.pone.0030869.e30869 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Lin WZ, Fang JA, Xiao X, Chou KC. iDNA-prot: identification of DNA binding proteins using random forest with grey model. PLoS ONE. 2011;6(9) doi: 10.1371/journal.pone.0024756.e24756 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Kandaswamy KK, Chou K-C, Martinetz T, et al. AFP-Pred: a random forest approach for predicting antifreeze proteins from sequence-derived properties. Journal of Theoretical Biology. 2011;270(1):56–62. doi: 10.1016/j.jtbi.2010.10.037. [DOI] [PubMed] [Google Scholar]
- 46.Cai YD, Chou KC. Predicting subcellular localization of proteins in a hybridization space. Bioinformatics. 2004;20(7):1151–1156. doi: 10.1093/bioinformatics/bth054. [DOI] [PubMed] [Google Scholar]
- 47.Chou KC, Cai YD. Prediction of protease types in a hybridization space. Biochemical and Biophysical Research Communications. 2006;339(3):1015–1020. doi: 10.1016/j.bbrc.2005.10.196. [DOI] [PubMed] [Google Scholar]
- 48.Shen H, Chou KC. Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types. Biochemical and Biophysical Research Communications. 2005;334(1):288–292. doi: 10.1016/j.bbrc.2005.06.087. [DOI] [PubMed] [Google Scholar]
- 49.Chou KC, Shen HB. Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. Journal of Proteome Research. 2007;6(5):1728–1734. doi: 10.1021/pr060635i. [DOI] [PubMed] [Google Scholar]
- 50.Shen HB, Chou KC. A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0. Analytical Biochemistry. 2009;394(2):269–274. doi: 10.1016/j.ab.2009.07.046. [DOI] [PubMed] [Google Scholar]
- 51.Zhang TL, Ding YS, Chou KC. Prediction protein structural classes with pseudo-amino acid composition: approximate entropy and hydrophobicity pattern. Journal of Theoretical Biology. 2008;250(1):186–193. doi: 10.1016/j.jtbi.2007.09.014. [DOI] [PubMed] [Google Scholar]
- 52.Xiao X, Wang P, Chou KC. GPCR-2L: predicting G protein-coupled receptors and their types by hybridizing two different modes of pseudo amino acid compositions. Molecular BioSystems. 2011;7(3):911–919. doi: 10.1039/c0mb00170h. [DOI] [PubMed] [Google Scholar]
- 53.Shen HB, Yang J, Chou KC. Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition. Journal of Theoretical Biology. 2006;240(1):9–13. doi: 10.1016/j.jtbi.2005.08.016. [DOI] [PubMed] [Google Scholar]
- 54.Xiao X, Min JL, Wang P. iGPCR-Drug: a web server for predicting interaction between GPCRs and drugs in cellular networking. PLoS ONE. 2013;8(8) doi: 10.1371/journal.pone.0072234.e72234 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Xiao X, Wang P, Lin W-Z, Jia J-H, Chou K-C. iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. Analytical Biochemistry. 2013;436(2):168–177. doi: 10.1016/j.ab.2013.01.019. [DOI] [PubMed] [Google Scholar]
- 56.Chou KC. Some remarks on predicting multi-label attributes in molecular biosystems. Molecular Biosystems. 2013;9(6):1092–1100. doi: 10.1039/c3mb25555g. [DOI] [PubMed] [Google Scholar]
- 57.Nakashima H, Nishikawa K, Ooi T. The folding type of a protein is relevant to the amino acid composition. Journal of Biochemistry. 1986;99(1):153–162. doi: 10.1093/oxfordjournals.jbchem.a135454. [DOI] [PubMed] [Google Scholar]
- 58.Cedano J, Aloy P, Pérez-Pons JA, Querol E. Relation between amino acid composition and cellular location of proteins. Journal of Molecular Biology. 1997;266(3):594–600. doi: 10.1006/jmbi.1996.0804. [DOI] [PubMed] [Google Scholar]
- 59.Zhou G-P. An intriguing controversy over protein structural class prediction. Protein Journal. 1998;17(8):729–738. doi: 10.1023/a:1020713915365. [DOI] [PubMed] [Google Scholar]
- 60.Lin S-X, Lapointe J. Theoretical and experimental biology in one. Journal of Biomedical Science and Engineering (JBiSE) 2013;6:435–442. [Google Scholar]
- 61.Zhou X-B, Chen C, Li Z-C, Zou X-Y. Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. Journal of Theoretical Biology. 2007;248(3):546–551. doi: 10.1016/j.jtbi.2007.06.001. [DOI] [PubMed] [Google Scholar]
- 62.Nanni L, Lumini A. Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization. Amino Acids. 2008;34(4):653–660. doi: 10.1007/s00726-007-0018-1. [DOI] [PubMed] [Google Scholar]
- 63.Georgiou DN, Karakasidis TE, Nieto JJ, Torres A. Use of fuzzy clustering technique and matrices to classify amino acids and its impact to Chou’s pseudo amino acid composition. Journal of Theoretical Biology. 2009;257(1):17–26. doi: 10.1016/j.jtbi.2008.11.003. [DOI] [PubMed] [Google Scholar]
- 64.Esmaeili M, Mohabatkar H, Mohsenzadeh S. Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses. Journal of Theoretical Biology. 2010;263(2):203–209. doi: 10.1016/j.jtbi.2009.11.016. [DOI] [PubMed] [Google Scholar]
- 65.Mohabatkar H. Prediction of cyclin proteins using chou’s pseudo amino acid composition. Protein and Peptide Letters. 2010;17(10):1207–1214. doi: 10.2174/092986610792231564. [DOI] [PubMed] [Google Scholar]
- 66.Sahu SS, Panda G. A novel feature representation method based on Chou’s pseudo amino acid composition for protein structural class prediction. Computational Biology and Chemistry. 2010;34(5-6):320–327. doi: 10.1016/j.compbiolchem.2010.09.002. [DOI] [PubMed] [Google Scholar]
- 67.Mohabatkar H, Mohammad Beigi M, Esmaeili A. Prediction of GABA(A) receptor proteins using the concept of Chou’s pseudo-amino acid composition and support vector machine. Journal of Theoretical Biology. 2011;281(1):18–23. doi: 10.1016/j.jtbi.2011.04.017. [DOI] [PubMed] [Google Scholar]
- 68.Mohammad Beigi M, Behjati M, Mohabatkar H. Prediction of metalloproteinase family based on the concept of Chou’s pseudo amino acid composition using a machine learning approach. Journal of Structural and Functional Genomics. 2011;12(4):191–197. doi: 10.1007/s10969-011-9120-4. [DOI] [PubMed] [Google Scholar]
- 69.Mei S. Multi-kernel transfer learning based on Chou’s PseAAC formulation for protein submitochondria localization. Journal of Theoretical Biology. 2012;293:121–130. doi: 10.1016/j.jtbi.2011.10.015. [DOI] [PubMed] [Google Scholar]
- 70.Nanni L, Brahnam S, Lumini A. Wavelet images and Chou’s pseudo amino acid composition for protein classification. Amino Acids. 2012;43(2):657–665. doi: 10.1007/s00726-011-1114-9. [DOI] [PubMed] [Google Scholar]
- 71.Nanni L, Lumini A, Gupta D, Garg A. Identifying bacterial virulent proteins by fusing a set of classifiers based on variants of Chou’s Pseudo amino acid composition and on evolutionary information. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2012;9(2):467–475. doi: 10.1109/TCBB.2011.117. [DOI] [PubMed] [Google Scholar]
- 72.Gupta MK, Niyogi R, Misra M. An alignment-free method to find similarity among protein sequences via the general form of Chou’s pseudo amino acid composition. SAR and QSAR in Environmental Research. 2013;24(7):597–609. doi: 10.1080/1062936X.2013.773378. [DOI] [PubMed] [Google Scholar]
- 73.Hajisharifi Z, Piryaiee M, Mohammad Beigi M, Behbahani M, Mohabatkar H. Predicting anticancer peptides with Chou’s pseudo amino acid composition and investigating their mutagenicity via Ames test. Journal of Theoretical Biology. 2014;341:34–40. doi: 10.1016/j.jtbi.2013.08.037. [DOI] [PubMed] [Google Scholar]
- 74.Huang C, Yuan J. Using radial basis function on the general form of Chou’s pseudo amino acid composition and PSSM to predict subcellular locations of proteins with both single and multiple sites. Biosystems. 2013;113(1):50–57. doi: 10.1016/j.biosystems.2013.04.005. [DOI] [PubMed] [Google Scholar]
- 75.Huang C, Yuan JQ. Predicting protein subchloroplast locations with both single and multiple sites via three different modes of Chou’s pseudo amino acid compositions. Journal of Theoretical Biology. 2013;335:205–212. doi: 10.1016/j.jtbi.2013.06.034. [DOI] [PubMed] [Google Scholar]
- 76.Mohabatkar H, Mohammad Beigi M, Abdolahi K, Mohsenzadeh S. Prediction of allergenic proteins by means of the concept of Chou’s pseudo amino acid composition and a machine learning approach. Medicinal Chemistry. 2013;9(1):133–137. doi: 10.2174/157340613804488341. [DOI] [PubMed] [Google Scholar]
- 77.Sarangi AN, Lohani M, Aggarwal R. Prediction of essential proteins in prokaryotes by incorporating various physico-chemical features into the general form of Chou’s pseudo amino acid composition. Protein and Peptide Letters. 2013;20(7):781–795. doi: 10.2174/0929866511320070008. [DOI] [PubMed] [Google Scholar]
- 78.Wan S, Mak MW, Kung SY. GOASVM: a subcellular location predictor by incorporating term-frequency gene ontology into the general form of Chou’s pseudo-amino acid composition. Journal of Theoretical Biology. 2013;323:40–48. doi: 10.1016/j.jtbi.2013.01.012. [DOI] [PubMed] [Google Scholar]
- 79.Chen W, Lei TY, Jin DC, Lin H. PseKNC: a flexible web-server for generating pseudo K-tuple nucleotide composition. Analytical Biochemistry. 2014;456:53–60. doi: 10.1016/j.ab.2014.04.001. [DOI] [PubMed] [Google Scholar]
- 80.Li B-Q, Huang T, Liu L, Cai Y-D, Chou K-C. Identification of colorectal cancer related genes with mrmr and shortest path in protein-protein interaction network. PLoS ONE. 2012;7(4) doi: 10.1371/journal.pone.0033393.e33393 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Huang T, Wang J, Cai Y-D, Yu H, Chou K-C. Hepatitis C virus network based classification of hepatocellular cirrhosis and carcinoma. PLoS ONE. 2012;7(4) doi: 10.1371/journal.pone.0034460.e34460 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Jiang Y, Huang T, Chen L, Gao YF, Cai Y, Chou K-C. Signal propagation in protein interaction network during colorectal cancer progression. BioMed Research International. 2013;2013:9 pages. doi: 10.1155/2013/287019.287019 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Shen H-B, Chou K-C. PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Analytical Biochemistry. 2008;373(2):386–388. doi: 10.1016/j.ab.2007.10.012. [DOI] [PubMed] [Google Scholar]
- 84.Du P, Wang X, Xu C, Gao Y. PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou’s pseudo-amino acid compositions. Analytical Biochemistry. 2012;425(2):117–119. doi: 10.1016/j.ab.2012.03.015. [DOI] [PubMed] [Google Scholar]
- 85.Cao DS, Xu QS, Liang YZ. propy: a tool to generate various modes of Chou’s PseAAC. Bioinformatics. 2013;29(7):960–962. doi: 10.1093/bioinformatics/btt072. [DOI] [PubMed] [Google Scholar]
- 86.Du P, Gu S, Jiao Y. PseAAC-General: fast building various modes of general form of Chou’s pseudo-amino acid composition for large-scale protein datasets. International Journal of Molecular Sciences. 2014;15(3):3495–3506. doi: 10.3390/ijms15033495. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Chen YW LC. Combining SVMs with various feature selection strategies. In: Guyon I, Nikravesh N, Gunn S, Zadeh L, editors. Feature Extraction. Berlin, Germany: Springer; 2006. pp. 315–324. [Google Scholar]
- 88.Lin H, Ding H, Guo F-B, Huang J. Prediction of subcellular location of mycobacterial protein using feature selection techniques. Molecular Diversity. 2010;14(4):667–671. doi: 10.1007/s11030-009-9205-1. [DOI] [PubMed] [Google Scholar]
- 89.Chou K-C, Cai Y-D. Using functional domain composition and support vector machines for prediction of protein subcellular location. The Journal of Biological Chemistry. 2002;277(48):45765–45769. doi: 10.1074/jbc.M204161200. [DOI] [PubMed] [Google Scholar]
- 90.Cai Y-D, Zhou G-P, Chou K-C. Support vector machines for predicting membrane protein types by using functional domain composition. Biophysical Journal. 2003;84(5):3257–3263. doi: 10.1016/S0006-3495(03)70050-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Cristianini N, Shawe-Taylor J. An Introduction of Support Vector Machines and Other Kernel-Based Learning Methodds. Cambridge, UK: Cambridge University Press; 2000. [Google Scholar]
- 92.Chou KC, Zhang CT. Review: prediction of protein structural classes. Critical Reviews in Biochemistry and Molecular Biology. 1995;30(4):275–349. doi: 10.3109/10409239509083488. [DOI] [PubMed] [Google Scholar]
- 93.Zhou GP, Assa-Munt N. Some insights into protein structural class prediction. Proteins: Structure, Function and Genetics. 2001;44(1):57–59. doi: 10.1002/prot.1071. [DOI] [PubMed] [Google Scholar]
- 94.Chou K-C, Wu Z-C, Xiao X. ILoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. Molecular BioSystems. 2012;8(2):629–641. doi: 10.1039/c1mb05420a. [DOI] [PubMed] [Google Scholar]
- 95.Chou K-C, Wu Z-C, Xiao X. iLoc-Euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins. PLoS ONE. 2011;6(3) doi: 10.1371/journal.pone.0018258.e18258 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Chou K-C. Using subsite coupling to predict signal peptides. Protein Engineering. 2001;14(2):75–79. doi: 10.1093/protein/14.2.75. [DOI] [PubMed] [Google Scholar]
- 97.Chou KC. Prediction of signal peptides using scaled window. Peptides. 2001;22(12):1973–1979. doi: 10.1016/s0196-9781(01)00540-x. [DOI] [PubMed] [Google Scholar]
- 98.Chou KC, Shen HB. Review: recent advances in developing web-servers for predicting protein attributes. Natural Science. 2009;1(2):63–92. [Google Scholar]
- 99.Chou KC. Review: prediction of human immunodeficiency virus protease cleavage sites in proteins. Analytical Biochemistry. 1996;233(1):1–14. doi: 10.1006/abio.1996.0001. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supporting Information S1: The benchmark dataset 𝕊 contains 112 conotoxins, of which 24 belong to K-channel-targeting type, 43 to Na-channel-targeting type, and 45 to Ca-channel-targeting type.
Supporting Information S2: The independent dataset 𝕊 Ind contains 70 conotoxins, of which 12 are of K-channel-targeting type, 37 of Na-channel-targeting type, and 21 of Ca-channel-targeting type. None of the samples listed here occurs in benchmark dataset 𝕊.
