iRSpot-Pse6NC: Identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC

Hui Yang; Wang-Ren Qiu; Guoqing Liu; Feng-Biao Guo; Wei Chen; Kuo-Chen Chou; Hao Lin

doi:10.7150/ijbs.24616

. 2018 May 22;14(8):883–891. doi: 10.7150/ijbs.24616

iRSpot-Pse6NC: Identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC

Hui Yang ¹, Wang-Ren Qiu ^1,², Guoqing Liu ³, Feng-Biao Guo ¹, Wei Chen ^1,^4,^5,^✉, Kuo-Chen Chou ^1,^5,^✉, Hao Lin ^1,^5,^✉

PMCID: PMC6036749 PMID: 29989083

Abstract

Meiotic recombination caused by meiotic double-strand DNA breaks. In some regions the frequency of DNA recombination is relatively higher, while in other regions the frequency is lower: the former is usually called “recombination hotspot”, while the latter the “recombination coldspot”. Information of the hot and cold spots may provide important clues for understanding the mechanism of genome revolution. Therefore, it is important to accurately predict these spots. In this study, we rebuilt the benchmark dataset by unifying its samples with a same length (131 bp). Based on such a foundation and using SVM (Support Vector Machine) classifier, a new predictor called “iRSpot-Pse6NC” was developed by incorporating the key hexamer features into the general PseKNC (Pseudo K-tuple Nucleotide Composition) via the binomial distribution approach. It has been observed via rigorous cross-validations that the proposed predictor is superior to its counterparts in overall accuracy, stability, sensitivity and specificity. For the convenience of most experimental scientists, the web-server for iRSpot-Pse6NC has been established at http://lin-group.cn/server/iRSpot-Pse6NC, by which users can easily obtain their desired result without the need to go through the detailed mathematical equations involved.

Keywords: Recombination spot, 5-step rules, Key hexamers, PseKNC, SVM, Webserver

Introduction

Meiotic recombination occurs at each generation in diploid organisms, which is caused by meiotic double-strand DNA breaks (DSBs)¹(Figure 1). Meiosis can guarantee not only the stability of the chromosome number of species but also a species evolving mechanism to adapt to the environment changes ². Recombination can lead to a change in genetic information between homologous chromosomes. Thus, it is one of main driving forces in genome evolution. The frequency of DNA recombination in some regions is relatively higher as referred to recombination hotspots, while in other regions the frequency is lower referred to the recombination coldspots ³^-⁵.

The schematic drawing to show the meiotic recombination pathways in a DNA system.

There have been many in-depth studies of recombination sites ³^;⁶^-⁹. Gerton et al. ³ mapped double-strand break sites on chromosomes in the Saccharomyces cerevisiae (S. cerevisiae), and found that hotspots were non-randomly associated with regions of high GC base composition, while coldspots were non-randomly associated with the centromeres and telomeres. Some hotspots that require transcription factor binding are called α hotspots, and others are called β hotspots ³. Recently, there have been new developments on the research of recombination sites. ChIP experiments showed that substantial Spo11 persists at Rec8 binding sites during DSB formation ¹⁰; PRDM9, as a catalytic H3K4 trimethylated histone trimethylase, is involved in the initiation of recombination and recombination with recombination hot spots ¹¹, found that the regions with high nucleosome occupancy have high recombination rate in the yeast genome ¹².

The correct identification of recombination spots can provide important clues for understanding the evolution mechanism. Generally, biochemical experiments can produce accurate information for determine recombination spots. However, with the development of high-throughput sequencing technique, more and more genome data were generated, thus, determining recombination spots with these wet-experiments requires more and more expensive experimental materials and long experimental period. Machine learning-based methods are a good choice for timely and accurately identifying the recombination spots. Up to now, some methods have been developed to identify recombination spot. Jiang et al. firstly developed a new model based on gapped dinucleotide composition and random forest (RF) to predict meiotic recombination hotspots and coldspots in S. cerevisiae ¹³. In the meantime, Zhou et al. established an SVM-based model to discriminate hotspots from coldspots in S. cerevisiae by using codon composition ¹⁴. Subsequently, Liu et al. proposed to use the increment of diversity combined with quadratic discriminant for predicting the recombination spots ¹⁵. Chen et al. developed a new DNA sample descriptor called pseudo dinucleotide composition (PseDNC) to improve prediction accuracy for the recombination hotspots and coldspots ¹⁶. According to the concept of PseDNC, Li et al. ¹⁷ and Qiu et al. ¹⁸ also developed different prediction models to address this problem. Liu et al. incorporated the weight of features into recombination hotspots prediction model ¹⁹. A predictor called iRSpot-DACC was also presented to predict recombination hotspots and coldspots ²⁰. Recently, the same problem was further investigated by including the Z curve approach ²¹, and the ensemble learning approach ²².

Although the aforementioned methods could achieve quite encouraging results, further studies are needed due to the following reasons. (i) The DNA samples used to train the models are with different length, which prevents them from establishing a widely useful model because users do not know how long the working length should be used for a query DNA sequence. For example, in using the aforementioned methods to scan a chromosome, we do not know the optimal width of the scan window ²³ for the biological sequence concerned. In fact, for the published webserver based on those methods, only a prediction will be given even for a chromosome with a length of thousands base pairs. However, there are many recombination points in the genome. Therefore, most of those models are quite limited for practical applications. (ii) Some works ¹³^;¹⁴^;²¹^;²⁴ used codon composition or coding region information to formulate DNA samples. However, recombination spots are not always located in coding regions. Some non-coding regions may also contain recombination spots. Thus, these methods could not identify recombination spots in the intergenic regions. (iii) The prediction results are still far from satisfactory yet; the accuracy should be further improved. (iv) Only three webservers were published. For the convenience of most experimental scientists, more user-friendly webservers in this regard are needed.

The present study was devoted to develop a more powerful predictor in this area by considering the aforementioned four issues. To make the new predictor more clear in logical development and more useful in practical application, the Chou's 5-step rules ²⁵ were followed as reported in a series of recent studies (see, e.g., 26-35).

Materials and Methods

Benchmark dataset: hot/cold spots DNA sequences

According to the Chou's 5-step rules, the first prerequisite to establish an effective predictor for a biological system is to construct or select a high quality benchmark dataset. In this study, the raw data was derived from Gerton et al. ³, who used DNA microarray as the single-gene resolution method to estimate the DSBs formation adjacent to each ORF for the S. cerevisiae loci. They measured the ratio of DSB-rich probes hybridized to total genomic probes. Based on the experimental data, Jiang et al. ¹³ constructed a benchmark dataset including 490 recombination hotspots and 591 coldspots.

So far most of the existing models ¹³^-²⁰ were built up based on such benchmark dataset. The length distribution of original samples was shown in Figure 2. It was noticed that the length distributed in a wide range from the shortest one of 131 bp to the longest one of thousands bp. To overcome such a shortcoming, we rebuilt the benchmark dataset according to the strategy that recombination hotspots were correlated with peaks of G+C base composition ³. By doing so, we unified the length of each sample to 131 bp because the length of shortest sequence is 131 bp. For those sequences with >131 bp, we chose their subsequences with 131 bp that have the maximum GC content. As a result, the new dataset also has 490 samples for recombination hotspots and 591 samples for recombination coldspots, but all the sequences are 131 bp long now. The new benchmark dataset can be downloaded from the link at http://lin-group.cn/server/iRSpot-Pse6NC.

The length distribution of benchmark dataset samples.

Hexamer composition and its PseKNC vector

How to translate a DNA sequence D with L bases into a vector is the second important step to develop a predictor for discriminating recombination hotspots from recombination coldspots. This is because all the existing machine-learning algorithms can only handle vectors but not sequences as elaborated in ³⁶. But a vector in a discrete framework might totally lose all the sequence-order or pattern information. To deal with this problem, the PseAAC (Pseudo Amino Acid Composition) was introduced ³⁷. Ever since the concept of PseAAC was proposed, it has been swiftly penetrated into many biomedicine and drug development areas ³⁸^;³⁹ as well as nearly all the areas of computational proteomics (see, e.g., 40-48 and a long list of references cited in a recent review paper ⁴⁹). Encouraged by the successes of using PseAAC to deal with protein/peptide sequences, its idea has been extended to deal with DNA/RNA sequences ¹⁶^;²²^;²⁴^;³²^;⁵⁰ in computational genomics via PseKNC (Pseudo K-tuple Nucleotide Composition) ⁵¹^;⁵². According to ⁵³, for a DNA sample with L nucleic acid residues:

graphic file with name ijbsv14p0883i001.jpg

(1)

its general form of PseKNC can be formulated as:

graphic file with name ijbsv14p0883i002.jpg

(2)

where T is the transposing operator, the subscript Inline graphic is an integer, and its value and the components will depend on how to extract the desired features and properties from the DNA sequence. In this study, their definitions are described below.

K-tuple (or called K-mer) nucleotide composition has important biological significance ⁵⁴ that the whole DNA sequence can be uniquely determined from the K-tuple nucleotide frequency distribution; i.e., the frequency distribution of K-tuple nucleotide contains mostly the information of the DNA sequence. And K-mer nucleotide composition has been widely used in gene identification ⁵⁵ and other regulatory element recognition ²⁴^;⁵⁶^-⁵⁹. Several studies ⁶⁰^,⁶¹ have shown that hexamer (6-mer) distribution has unique properties among species and different DNA fragments. Thus, we have the dimension of PseKNC in Eq.2 is:

graphic file with name ijbsv14p0883i005.jpg

(3)

and its components given by:

graphic file with name ijbsv14p0883i006.jpg

(4)

where Inline graphic and L denote the number of the u-th hexamer and the length of the sample sequence, respectively. Thus, the DNA sample has been uniquely defined in a 4096-D PseKNC vector.

The rule for ranking features

The DNA sequence is represented by a set of 4096 features, which may bring out three problems ⁶²^-⁶³: (1) containing some redundant or irrelevant information; (ii) leading to an over-fitting model and reducing its flexibility; (iii) causing the curse of dimensionality and dyscalculia. However, we can improve these problems by means of the feature selection approach ⁶⁴. Many effective feature selection techniques have been proposed, such as diffusion Maps ⁶⁵, principal component analysis (PCA) ⁶⁶^-⁶⁸, analysis of variance (ANOVA) ⁶⁹^;⁷⁰, recursive feature elimination algorithm ⁷¹^;⁷² and geometry preserving projections (GPP) ⁷³ and so on. These techniques are all quite efficient in alleviating the interference from noise or irrelevant features so as to improve the prediction quality.

Here, let us define a prior probability given by

graphic file with name ijbsv14p0883i008.jpg

graphic file with name ijbsv14p0883i009.jpg

(5)

where M is the total occurrence times of all hexamers in the benchmark dataset (including both positive and negative samples), and Inline graphic represents the number of hexamers in the i-th type with i = 1 referring to the positive subset whereas i=2 referring to the negative subset.

Now, the probability of the j-th hexamers occurring in type i can be formulated as

graphic file with name ijbsv14p0883i011.jpg

(6)

where Inline graphic represents the total occurrence number of a given j-th hexamer in the benchmark dataset. The smaller the P(), the lower the probability of the j-th hexamer randomly occurring in type i, meaning the hexamer has more biological significance. The confidence level (CL) of the j-th hexamer occurring in i-th type of sample is defined by:

graphic file with name ijbsv14p0883i014.jpg

graphic file with name ijbsv14p0883i015.jpg

(7)

Suppose:

graphic file with name ijbsv14p0883i016.jpg

graphic file with name ijbsv14p0883i017.jpg

(8)

thus the 4096 hexamers can be ranked according to the values of Eq.8.

Support vector machine

Support vector machine (SVM) is a supervised machine learning algorithm based on statistical learning theory, and has been successfully applied in the field of bioinformatics ⁷⁴. The basic idea of SVM is to transform the data into a high dimensional feature space and then determine the optimal separating hyper plane. For a brief formulation of SVM and how it is working, see the papers ⁷⁵^;⁷⁶; for more details about SVM, see a monograph ⁷⁷. In this study, we used the free software LIBSVM 3.20, which was developed by Chang and Lin ⁷⁸. Due to its good performance for classification, the radial basis kernel function was used to obtain the best classification hyper plane. The two parameters, C and γ, which were preliminarily optimized through a grid search strategy.

The proposed predictor thus built up is called iRSpot-Pse6NC, where “i” stands for “identify”, “RSpot” for “Recombination Spots”, and “Pse6NC” for “Pseudo 6-tuple Nucleotide Composition”.

Results and Discussion

Cross-validation

To evaluate the quality of a new predictor, one needs to consider the following two things: (i) what metrics should be used to measure its performance? (ii) what test method should be adopted to calculate these metrics? In literature, the following four metrics are usually used to measure a predictor's quality ⁷⁹: (i) overall accuracy (Acc); (ii) stability (MCC); (iii) sensitivity (Sn); and (4) specificity (Sp). But their conventional expressions directly taken from math books are lack of intuition and difficult to understand by most biological scientists. Fortunately, by means of the symbols introduced by Chou in studying signal peptides ²³, the four conventional metrics can be converted to a set of intuitive ones ¹⁶^;⁸⁰^;⁸¹ as given below:

graphic file with name ijbsv14p0883i018.jpg

(9)

where Inline graphic represents the total number of positive samples investigated, while is the number of positive samples incorrectly predicted to be of negative one; the total number of negative samples investigated, while the number of the negative samples incorrectly predicted to be of positive one.

As pointed out by many recent publications (see, e.g., 22; 32; 33; 50; 82-90), the meanings of Sn, Sp, Acc, and MCC have become crystal clear when using Eq.9.

With a set of intuitive metrics, the next thing is how to test their values. As is well known, the independent dataset test, subsampling (or K-fold cross-validation) test, and jackknife test are the three cross-validation methods widely used for testing a prediction method ⁹¹. To reduce the computational cost, in this study we adopted the 5-fold cross-validation (namely K=5), as done by many investigators with SVM as the prediction engine (see, e.g., 24; 26; 92-95).

Comparison with existing methods

Listed in Table 1 are the metrics rates (Eq.9) achieved by iRSpot-Pse6NC via the 5-fold cross-validation on the benchmark dataset. For facilitating comparison, listed there are also the corresponding rates obtained by iRSpot-PseDNC ¹⁶, iRSpot-KNCPseAAC ¹⁸, and IDQD ¹⁵ using exactly the same cross-validation method and same benchmark dataset. As we can see from the table, the rates achieved by iRSpot-Pse6NC are remarkably higher than its cohorts in all the four metrics, clearly indicating the proposed predictor is indeed superior to the existing predictors in this area.

Table 1.

A comparison of the proposed predictor with the existing ones.

Method	Sn^a	Sp^a	Acc^a	MCC^a
iRSpot-Pse6NC^b	0.7571	0.9103	0.8408	0.6805
iRSpot-PseDNC^c	0.6234	0.9052	0.7792	0.5585
iRSpot-KNCPseAAC^d	0.6102	0.8951	0.7660	0.5334
IDQD^e	0.6959	0.7509	0.7259	0.4469

Open in a new tab

^aSee Eq.9 for the metrics definition

^bProposed in this paper

^cFrom ¹⁶

^dFrom ¹⁸

^eFrom ¹⁵

Feature analysis

As mentioned in section 2.3, the dimension for the hexamer vector is 4096, which is too large to avoid the high-dimension problems. To exclude the noise and redundant features, we used the incremental feature selection (IFS) to find out the best feature subset to maximize accuracy. We initially ranked the 4096 hexamers according to Eqs.5-8. Subsequently, the 4096 feature subsets were obtained, in which the first feature subset contained the first hexamer, the second feature subset was produced by adding the second hexamer into the first feature subset, and so on. Thirdly, the SVM with 5-fold cross-validation was adopted to examine the accuracies of 4096 feature subsets. By using Acc as vertical coordinates and feature number as horizontal coordinates, we plotted IFS curve in Figure 3. One may notice that the peak of the curve is 84.08%, which is located at horizontal coordinate of 381. This result (84.08%) is dramatically higher than that (71.04%) of all features. Meanwhile, we also dramatically reduced the considered features from 4096 to 381, indicating that our proposed feature selection technique could pick out the optimal hexamers so as to further improve the prediction quality. Accordingly, the 381 hexamers were selected to form the optimal feature subset to train the prediction model.

The 5-fold cross-validated IFS curve for predicting recombination hotspots and coldspots. An IFS peak of 84.08% was observed when using the top 381 hexamers to perform prediction.

To further investigate the performance of the optimal model across the entire range of SVM decision values, we drew the ROC curve ⁹⁶ in Figure 4. It shows that the AUC (the Area Under ROC Curve) reaches the value of 0.9084, indicating that the proposed method is quite promising and holds very high potential to become a useful high-throughput tool for predicting recombination spots.

The ROC curve for identifying recombination spots by using 381 optimal hexamers. The AUC of 0.9084 was obtained in 5-fold cross-validation. The diagonal dot line denotes a random guess with the AUC of 0.5.

For further analyzing the contributions of different features in the prediction model, a heat map ⁹⁷ was provided (Figure 5), which is a graphical representation of a matrix by using different colors according to its CL values scaled between 0 and 1. As we can see from Figure 5, for the 4096 different hexamers, the majority of them are blue or green, indicating that most of them are irrelevant to the recombination spot recognition.

A heat map to illustrate the CL of the 4096 different hexamers. The color scale is ranged from blue (low CL) through green and yellow to red (high CL). See the main text for further explanation. A higher resolution version can be found at http://lin-group.cn/server/iRSpot-Pse6NC/heatmap2.jpg.

It can be seen from Figure 5 that those regions with high GC content, e.g., the hexamers CGCCGG, AGCCGG and GCAGCT, GCCGGA, AGTGGG are with the CL values ranking top five among all the features and with the confidence level of CL > 98.3%.

Moreover, we performed a detail analysis on the 381 optimal hexamers with CL>98.3% to investigate the relationship between the features and GC content (Figure 6). In this figure, abscissa coordinate denotes the GC content distribution from 0% -100%, and the vertical axis indicates that the percentage of positive and negative samples at the GC content shown on the abscissa. It can be seen from the figure that the optimal hexamers with high GC content have a higher proportion in positive samples, whereas hexamers with lower GC contents have a higher proportion of negative samples. This means that there is a close relationship between GC content and the hot spots, once again proofing that the way we handled the data is fully valid.

The graph to show the relationship between the important features and GC content.

Web-server and user guide

As pointed out in ²⁵ and demonstrated in many follow-up publications (see, e.g., 28; 30; 32; 35; 81; 98-116), user-friendly and publicly accessible web-servers represent the future direction for developing practically more useful predictors. Actually, a new prediction method with the availability of a user-friendly web-server would significantly enhance its impacts ³⁶^;⁴⁹. In view of this, the web-server for iRSpot-Pse6NC has been established. Furthermore, to maximize the convenience of most experimental scientists, the step-by-step instructions are given below.

Step 1. Open the web server at http://lin-group.cn/server/iRSpot-Pse6NC and you will see the top page of`iRSpot-Pse6NC shown on your computer screen (Figure 7).

A semi-screenshot for the top page of the iRSpot-Pse6NC webserver at http://lin-group.cn/server/iRSpot-Pse6NC.

Step 2. Click on the WEB SERVER button to start the prediction. Either type or copy/paste the query DNA sequences into the input box at the center of Figure 7. The input sequences should be in the FASTA format. And click on the Submit button to see the predicted result.

Step 3. Click on the DOWNLOAD button to download the benchmark data sets used to train and test the iRSpot-Pse6NC predictor.

Step 4. Click on the CITATION button to find the relevant papers that document the detailed development and algorithm of iRSpot-Pse6NC.

Step 5. Click on the HELP button to view the relevant instructions and the caveat when using it.

Acknowledgments

This work was supported by the National Nature Scientific Foundation of China (61772119, 31771471), Applied Basic Research Program of Sichuan Province (No. 2015JY0100), the Fundamental Research Funds for the Central Universities of China (Nos. ZYGX2015Z006, ZYGX2016J125, ZYGX2016J118), Natural Science Foundation for Distinguished Young Scholar of Hebei Province (No. C2017209244), the Program for the Top Young Innovative Talents of Higher Learning Institutions of Hebei Province (No. BJ2014028).

Author Contributions

H.L. conceived and designed the experiments; H.Y., W.R.Q., G.L., F.B.G. and W.C. analyzed the data and implemented SVM. H.Y., H.L. and W.C. established the web-server; H.Y., W.C., K.C.C. and H.L performed the analysis and wrote the paper. All authors read and approved the final manuscript.

References

1.Keeney S. Spo11 and the Formation of DNA Double-Strand Breaks in Meiosis. Genome Dyn Stab. 2008;2:81–123. doi: 10.1007/7050_2007_026. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Zenvirth D, Arbel T, Sherman A. et al. Multiple sites for double-strand breaks in whole meiotic chromosomes of Saccharomyces cerevisiae. EMBO J. 1992;11:3441–7. doi: 10.1002/j.1460-2075.1992.tb05423.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Gerton JL, DeRisi J, Shroff R. et al. Global mapping of meiotic recombination hotspots and coldspots in the yeast Saccharomyces cerevisiae. Proc Natl Acad Sci U S A. 2000;97:11383–90. doi: 10.1073/pnas.97.21.11383. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Marais G, Mouchiroud D, Duret L. Does recombination improve selection on codon usage? Lessons from nematode and fly complete genomes. Proc Natl Acad Sci U S A. 2001;98:5688–92. doi: 10.1073/pnas.091427698. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Myers S, Bottolo L, Freeman C. et al. A fine-scale map of recombination rates and hotspots across the human genome. Science. 2005;310:321–4. doi: 10.1126/science.1117196. [DOI] [PubMed] [Google Scholar]
6.Baudat F, Nicolas A. Clustering of meiotic double-strand breaks on yeast chromosome III. Proc Natl Acad Sci U S A. 1997;94:5213–8. doi: 10.1073/pnas.94.10.5213. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Klein S, Zenvirth D, Dror V. et al. Patterns of meiotic double-strand breakage on native and artificial yeast chromosomes. Chromosoma. 1996;105:276–84. doi: 10.1007/BF02524645. [DOI] [PubMed] [Google Scholar]
8.Kohl KP, Sekelsky J. Meiotic and mitotic recombination in meiosis. Genetics. 2013;194:327–34. doi: 10.1534/genetics.113.150581. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Lichten M, Goldman AS. Meiotic recombination hotspots. Annu Rev Genet. 1995;29:423–44. doi: 10.1146/annurev.ge.29.120195.002231. [DOI] [PubMed] [Google Scholar]
10.Ito M, Kugou K, Fawcett JA. et al. Meiotic recombination cold spots in chromosomal cohesion sites. Genes Cells. 2014;19:359–73. doi: 10.1111/gtc.12138. [DOI] [PubMed] [Google Scholar]
11.Parvanov ED, Petkov PM, Paigen K. Prdm9 controls activation of mammalian recombination hotspots. Science. 2010;327:835. doi: 10.1126/science.1181495. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Zhang B, Liu G. Predicting recombination hotspots in yeast based on DNA sequence and chromatin structure. Curr Bioinfor. 2014;9:28–33. [Google Scholar]
13.Jiang P, Wu H, Wei J. et al. RF-DYMHC: detecting the yeast meiotic recombination hotspots and coldspots by random forest model using gapped dinucleotide composition features. Nucleic Acids Res. 2007;35:W47–51. doi: 10.1093/nar/gkm217. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Zhou T, Weng J, Sun X. et al. Support vector machine for classification of meiotic recombination hotspots and coldspots in Saccharomyces cerevisiae based on codon composition. BMC Bioinfor. 2006;7:223. doi: 10.1186/1471-2105-7-223. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Liu G, Liu J, Cui X. et al. Sequence-dependent prediction of recombination hotspots in Saccharomyces cerevisiae. J Theor Biol. 2012;293:49–54. doi: 10.1016/j.jtbi.2011.10.004. [DOI] [PubMed] [Google Scholar]
16.Chen W, Feng PM, Lin H. et al. iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res. 2013;41:e68. doi: 10.1093/nar/gks1450. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Li L, Yu S, Xiao W. et al. Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM. BMC Bioinfor. 2014;15:340. doi: 10.1186/1471-2105-15-340. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Qiu WR, Xiao X, Chou KC. iRSpot-TNCPseAAC: identify recombination spots with trinucleotide composition and pseudo amino acid components. Int J Mol Sci. 2014;15:1746–66. doi: 10.3390/ijms15021746. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Liu G, Xing Y, Cai L. Using weighted features to predict recombination hotspots in Saccharomyces cerevisiae. J Theor Biol. 2015;382:15–22. doi: 10.1016/j.jtbi.2015.06.030. [DOI] [PubMed] [Google Scholar]
20.Liu B, Liu Y, Jin X. et al. iRSpot-DACC: a computational predictor for recombination hot/cold spots identification based on dinucleotide-based auto-cross covariance. Sci Rep. 2016;6:33483. doi: 10.1038/srep33483. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Dong C, Yuan YZ, Zhang FZ. et al. Combining pseudo dinucleotide composition with the Z curve method to improve the accuracy of predicting DNA elements: a case study in recombination spots. Mol Biosyst. 2016;12:2893–900. doi: 10.1039/c6mb00374e. [DOI] [PubMed] [Google Scholar]
22.Liu B, Wang S, Long R. et al. iRSpot-EL: identify recombination spots with an ensemble learning approach. Bioinformatics. 2017;33:35–41. doi: 10.1093/bioinformatics/btw539. [DOI] [PubMed] [Google Scholar]
23.Chou KC. Prediction of signal peptides using scaled window. Peptides. 2001;22:1973–9. doi: 10.1016/s0196-9781(01)00540-x. [DOI] [PubMed] [Google Scholar]
24.Lin H, Deng EZ, Ding H. et al. iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Res. 2014;42:12961–72. doi: 10.1093/nar/gku1019. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Chou KC. Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol. 2011;273:236–47. doi: 10.1016/j.jtbi.2010.12.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Meher PK, Sahu TK, Saini V. et al. Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou's general PseAAC. Sci Rep. 2017;7:42362. doi: 10.1038/srep42362. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Tripathi P, Pandey PN. A novel alignment-free method to classify protein folding types by combining spectral graph clustering with Chou's pseudo amino acid composition. J Theor Biol. 2017;424:49–54. doi: 10.1016/j.jtbi.2017.04.027. [DOI] [PubMed] [Google Scholar]
28.Cheng X, Zhao SC, Lin WZ. et al. pLoc-mAnimal: predict subcellular localization of animal proteins with both single and multiple sites. Bioinformatics. 2017;33:3524–31. doi: 10.1093/bioinformatics/btx476. [DOI] [PubMed] [Google Scholar]
29.Chen W, Feng P, Yang H. et al. iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences. Oncotarget. 2017;8:4208–17. doi: 10.18632/oncotarget.13758. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Lai HY, Chen XX, Chen W. et al. Sequence-based predictive modeling to identify cancerlectins. Oncotarget. 2017;8:28169–75. doi: 10.18632/oncotarget.15963. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Qiu WR, Sun BQ, Xiao X, iKcr-PseEns: Identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier. Genomics; 2017. p. 10. 1016/j.ygeno.201710.008. [DOI] [PubMed] [Google Scholar]
32.Dao FY, Yang H, Su ZD. et al. Recent advances in conotoxin classification by using machine learning methods. Molecules. 2017;22:1057. doi: 10.3390/molecules22071057. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Feng P, Yang H, Ding H. et al. iDNA6mA-PseKNC: Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics. 2018 doi: 10.1016/j.ygeno.2018.01.005. doi:10.1016/j.ygeno.2018.01.005. [DOI] [PubMed] [Google Scholar]
34.Zhao YW, Lai HY, Tang H. et al. Prediction of phosphothreonine sites in human proteins by fusing different features. Sci Rep. 2016;6:34817. doi: 10.1038/srep34817. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Yang H, Tang H, Chen XX. et al. Identification of secretory proteins in mycobacterium tuberculosis using pseudo amino acid composition. BioMed Res Int. 2016;2016:5413903. doi: 10.1155/2016/5413903. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Chou KC. Impacts of bioinformatics to medicinal chemistry. Med Chem. 2015;11:218–34. doi: 10.2174/1573406411666141229162834. [DOI] [PubMed] [Google Scholar]
37.Chou kC. Prediction of protein cellular attributes using pseudo amino acid composition. Proteins. 2011;43:246–55. doi: 10.1002/prot.1035. [DOI] [PubMed] [Google Scholar]
38.Zhong WZ, Zhou SF. Molecular science for drug development and biomedicine. Int J Mol Sci. 2014;15:20072–8. doi: 10.3390/ijms151120072. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Zhou GP, Zhong WZ. Perspectives in Medicinal Chemistry. Curr Top Med Chem. 2016;16:381–2. doi: 10.2174/156802661604151014114030. [DOI] [PubMed] [Google Scholar]
40.Esmaeili M, Mohabatkar H, Mohsenzadeh S. Using the concept of Chou's pseudo amino acid composition for risk type prediction of human papillomaviruses. J Theor Biol. 2010;263:203–209. doi: 10.1016/j.jtbi.2009.11.016. [DOI] [PubMed] [Google Scholar]
41.Mohammad Beigi M, Behjati M, Mohabatkar H. Prediction of metalloproteinase family based on the concept of Chou's pseudo amino acid composition using a machine learning approach. J Struct Funct Genomics. 2011;12:191–7. doi: 10.1007/s10969-011-9120-4. [DOI] [PubMed] [Google Scholar]
42.Tang H, Su ZD, Wei HH. et al. Prediction of cell-penetrating peptides with feature selection techniques. Biochem Biophys Res Commun. 2016;477:150–4. doi: 10.1016/j.bbrc.2016.06.035. [DOI] [PubMed] [Google Scholar]
43.Pacharawongsakda E, Theeramunkong T. Predict Subcellular Locations of Singleplex and Multiplex Proteins by Semi-Supervised Learning and Dimension-Reducing General Mode of Chou's PseAAC. IEEE Trans Nanobioscience. 2013;12:311–20. doi: 10.1109/TNB.2013.2272014. [DOI] [PubMed] [Google Scholar]
44.Nanni L, Brahnam S, Lumini A. Prediction of protein structure classes by incorporating different protein descriptors into general Chou's pseudo amino acid composition. J Theor Biol. 2014;360:109–16. doi: 10.1016/j.jtbi.2014.07.003. [DOI] [PubMed] [Google Scholar]
45.Sharma R, Dehzangi A, Lyons J. et al. Predict Gram-Positive and Gram-Negative Subcellular Localization via Incorporating Evolutionary Information and Physicochemical Features Into Chou's General PseAAC. IEEE Trans Nanobioscience. 2015;14:915–26. doi: 10.1109/TNB.2015.2500186. [DOI] [PubMed] [Google Scholar]
46.Ding H, Liu L, Guo FB. et al. Identify Golgi protein types with modified Mahalanobis discriminant algorithm and pseudo amino acid composition. Protein Pept Lett. 2011;18:58–63. doi: 10.2174/092986611794328708. [DOI] [PubMed] [Google Scholar]
47.Yu B, Li S, Qiu WY. et al. Accurate prediction of subcellular location of apoptosis proteins combining Chou's PseAAC and PsePSSM based on wavelet denoising. Oncotarget. 2017;8:107640–65. doi: 10.18632/oncotarget.22585. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Zhang S, Duan X. Prediction of protein subcellular localization with oversampling approach and Chou's general PseAAC. J Theor Biol. 2018;437:239–50. doi: 10.1016/j.jtbi.2017.10.030. [DOI] [PubMed] [Google Scholar]
49.Chou KC. An unprecedented revolution in medicinal chemistry driven by the progress of biological science. Curr Top Med Chem. 2017;17:2337–58. doi: 10.2174/1568026617666170414145508. [DOI] [PubMed] [Google Scholar]
50.Liu B, Yang F, Chou KC. 2L-piRNA: A two-layer ensemble classifier for identifying piwi-interacting RNAs and their function. Mol Ther - Nucleic Acids. 2017;7:267–77. doi: 10.1016/j.omtn.2017.04.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Chen W, Lei TY, Jin DC. et al. PseKNC: a flexible web-server for generating pseudo K-tuple nucleotide composition. Anal Biochem. 2014;456:53–60. doi: 10.1016/j.ab.2014.04.001. [DOI] [PubMed] [Google Scholar]
52.Chen W, Zhang X, Brooker J. et al. PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions. Bioinformatics. 2015;31:119–20. doi: 10.1093/bioinformatics/btu602. [DOI] [PubMed] [Google Scholar]
53.Chen W, Lin H, Chou KC. Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. Mol BioSyst. 2015;11:2620–34. doi: 10.1039/c5mb00155b. [DOI] [PubMed] [Google Scholar]
54.Ghandi M, Mohammad-Noori M, Beer MA. Robust k-mer frequency estimation using gapped k-mers. J Math Biol. 2014;69:469–500. doi: 10.1007/s00285-013-0705-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Hua ZG, Lin Y, Yuan YZ. et al. ZCURVE 3.0: identify prokaryotic genes with higher accuracy as well as automatically and accurately select essential genes. Nucleic Acids Res. 2015;43:W85–90. doi: 10.1093/nar/gkv491. [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Li WC, Deng EZ, Ding H. et al. iORI-PseKNC: a predictor for identifying origin of replication with pseudo k-tuple nucleotide composition. Chemom Intell Lab Syst. 2015;141:100–6. [Google Scholar]
57.Lin H, Liang ZY, Tang H. et al. Identifying sigma70 promoters with novel pseudo nucleotide composition. IEEE/ACM Trans Comput Biol Bioinform. 2017 doi: 10.1109/TCBB.2017.2666141. doi: 10.1109/TCBB.2017.2666141. [DOI] [PubMed] [Google Scholar]
58.Guo SH, Deng EZ, Xu LQ. et al. iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics. 2014;30:1522–9. doi: 10.1093/bioinformatics/btu083. [DOI] [PubMed] [Google Scholar]
59.Li WC, Zhong ZJ, Zhu PP. et al. Sequence analysis of origins of replication in the Saccharomyces cerevisiae genomes. Front Microbiol. 2014;5:574. doi: 10.3389/fmicb.2014.00574. [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Hsieh LC, Luo L, Ji F. et al. Minimal model for genome evolution and growth. Phys Rev Lett. 2003;90:018101. doi: 10.1103/PhysRevLett.90.018101. [DOI] [PubMed] [Google Scholar]
61.Lin H, Li QZ. Eukaryotic and prokaryotic promoter prediction using hybrid approach. Theory Biosci. 2011;130:91–100. doi: 10.1007/s12064-010-0114-8. [DOI] [PubMed] [Google Scholar]
62.Feng PM, Chen W, Lin H. et al. iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal Biochem. 2013;442:118–25. doi: 10.1016/j.ab.2013.05.024. [DOI] [PubMed] [Google Scholar]
63.Ding C, Yuan LF, Guo SH. et al. Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions. J Proteomics. 2012;77:321–28. doi: 10.1016/j.jprot.2012.09.006. [DOI] [PubMed] [Google Scholar]
64.Lin H, Ding H, Guo FB. et al. Prediction of subcellular location of mycobacterial protein using feature selection techniques. Mol Divers. 2010;14:667–71. doi: 10.1007/s11030-009-9205-1. [DOI] [PubMed] [Google Scholar]
65.Wu Y, Tang H, Chen W. et al. Predicting human enzyme family classes by using pseudo amino acid composition. Curr Proteomics. 2016;13:99–104. [Google Scholar]
66.Ma J, Gu H. A novel method for predicting protein subcellular localization based on pseudo amino acid composition. BMB Rep. 2010;43:670–6. doi: 10.5483/BMBRep.2010.43.10.670. [DOI] [PubMed] [Google Scholar]
67.Olivier I, Loots du T. A metabolomics approach to characterise and identify various Mycobacterium species. J Microbiol Methods. 2012;88:419–26. doi: 10.1016/j.mimet.2012.01.012. [DOI] [PubMed] [Google Scholar]
68.Du QS, Wang SQ, Xie NZ. et al. 2L-PCA: A two-level principal component analyzer for quantitative drug design and its applications. Oncotarget. 2017;8:70564–78. doi: 10.18632/oncotarget.19757. [DOI] [PMC free article] [PubMed] [Google Scholar]
69.Lin H, Ding H. Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition. J Theor Biol. 2011;269:64–9. doi: 10.1016/j.jtbi.2010.10.019. [DOI] [PubMed] [Google Scholar]
70.Tang H, Chen W, Lin H. Identification of immunoglobulins using Chou's pseudo amino acid composition with feature selection technique. Mol BioSyst. 2016;12:1269–75. doi: 10.1039/c5mb00883b. [DOI] [PubMed] [Google Scholar]
71.Zhang X, Lu X, Shi Q. et al. Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics. 2006;7:197. doi: 10.1186/1471-2105-7-197. [DOI] [PMC free article] [PubMed] [Google Scholar]
72.Qureshi MN, Min B, Jo HJ. et al. Multiclass Classification for the Differential Diagnosis on the ADHD Subtypes Using Recursive Feature Elimination and Hierarchical Extreme Learning Machine: Structural MRI Study. PLoS One. 2016;11:e0160697. doi: 10.1371/journal.pone.0160697. [DOI] [PMC free article] [PubMed] [Google Scholar]
73.Wang T, Xia T, Hu XM. Geometry preserving projections algorithm for predicting membrane protein types. J Theor Biol. 2010;262:208–13. doi: 10.1016/j.jtbi.2009.09.027. [DOI] [PubMed] [Google Scholar]
74.Vapnik VN. An overview of statistical learning theory. IEEE Trans Neural Netw. 1999;10:988–99. doi: 10.1109/72.788640. [DOI] [PubMed] [Google Scholar]
75.Chou KC, Cai YD. Using functional domain composition and support vector machines for prediction of protein subcellular location. J Biol Chem. 2002;277:45765–9. doi: 10.1074/jbc.M204161200. [DOI] [PubMed] [Google Scholar]
76.Qiu WR, Sun BQ, Tang H. et al. Identify and analysis crotonylation sites in histone by using support vector machines. Artif Intell Med. 2017;83:75–81. doi: 10.1016/j.artmed.2017.02.007. [DOI] [PubMed] [Google Scholar]
77.Cristianini N, Shawe-Taylor J. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press; 2000. Chapter 3. [Google Scholar]
78.Chang CC, Lin CJ, Training nu-support vector classifiers. theory and algorithms. Neural Comput. 2001;13:2119–47. doi: 10.1162/089976601750399335. [DOI] [PubMed] [Google Scholar]
79.Chen J, Liu H, Yang J. Prediction of linear B-cell epitopes using amino acid pair antigenicity scale. Amino Acids. 2007;33:423–8. doi: 10.1007/s00726-006-0485-9. [DOI] [PubMed] [Google Scholar]
80.Xu Y, Ding J, Wu LY. et al. iSNO-PseAAC: Predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition. PLoS ONE. 2013;8:e55844. doi: 10.1371/journal.pone.0055844. [DOI] [PMC free article] [PubMed] [Google Scholar]
81.Xu Y, Shao XJ, Wu LY. et al. iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins. PeerJ. 2013;1:e171. doi: 10.7717/peerj.171. [DOI] [PMC free article] [PubMed] [Google Scholar]
82.Chen E, Feng PM, Deng EZ. et al. iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. Anal Biochem. 2014;462:76–83. doi: 10.1016/j.ab.2014.06.022. [DOI] [PubMed] [Google Scholar]
83.Ding H, Li D. Identification of mitochondrial proteins of malaria parasite using analysis of variance. Amino Acids. 2015;47:329–33. doi: 10.1007/s00726-014-1862-4. [DOI] [PubMed] [Google Scholar]
84.Tang H, Zou P, Zhang C. et al. Identification of apolipoprotein using feature selection technique. Sci Rep. 2016;6:30441. doi: 10.1038/srep30441. [DOI] [PMC free article] [PubMed] [Google Scholar]
85.Ding H, Deng EZ, Yuan LF. et al. iCTX-Type: A sequence-based predictor for identifying the types of conotoxins in targeting ion channels. BioMed Res Int. 2014;2014:286419. doi: 10.1155/2014/286419. [DOI] [PMC free article] [PubMed] [Google Scholar]
86.Chen W, Feng PM, Lin H. et al. iSS-PseDNC: identifying splicing sites using pseudo dinucleotide composition. BioMed Res Int. 2014;2014:623149. doi: 10.1155/2014/623149. [DOI] [PMC free article] [PubMed] [Google Scholar]
87.Xu Y, Li C. Chou KC. iPreny-PseAAC: identify C-terminal cysteine prenylation sites in proteins by incorporating two tiers of sequence couplings into PseAAC. Med Chem. 2017;13:544–51. doi: 10.2174/1573406413666170419150052. [DOI] [PubMed] [Google Scholar]
88.Feng PM, Lin H, Chen W. Identification of antioxidants from sequence information using Naive Bayes. Comput Math Method Med. 2013;2013:567529. doi: 10.1155/2013/567529. [DOI] [PMC free article] [PubMed] [Google Scholar]
89.Feng P, Ding H, Yang H. et al. iRNA-PseColl: Identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into PseKNC. Mol Ther - Nucleic Acids. 2017;7:155–63. doi: 10.1016/j.omtn.2017.03.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
90.Feng PM, Ding H, Chen W. et al. Naive Bayes classifier with feature selection to identify phage virion proteins. Comput Math Method Med. 2013;2013:530696. doi: 10.1155/2013/530696. [DOI] [PMC free article] [PubMed] [Google Scholar]
91.Chou KC, Zhang CT. Prediction of protein structural classes. Crit Rev Biochem Mol Biol. 1995;30:275–349. doi: 10.3109/10409239509083488. [DOI] [PubMed] [Google Scholar]
92.Rahimi M, Bakhtiarizadeh MR, Mohammadi-Sangcheshmeh A. OOgenesis_Pred: A sequence-based method for predicting oogenesis proteins by six different modes of Chou's pseudo amino acid composition. J Theor Biol. 2017;414:128–36. doi: 10.1016/j.jtbi.2016.11.028. [DOI] [PubMed] [Google Scholar]
93.Khan M, Hayat M, Khan SA. et al. Unb-DPC: Identify mycobacterial membrane protein types by incorporating un-biased dipeptide composition into Chou's general PseAAC. J Theor Biol. 2017;415:13–9. doi: 10.1016/j.jtbi.2016.12.004. [DOI] [PubMed] [Google Scholar]
94.Tahir M, Hayat M, Kabir M. Sequence based predictor for discrimination of enhancer and their types by applying general form of Chou's trinucleotide composition. Comput Methods Programs Biomed. 2017;146:69–75. doi: 10.1016/j.cmpb.2017.05.008. [DOI] [PubMed] [Google Scholar]
95.Guo FB, Dong C, Hua HL. et al. Accurate prediction of human essential genes using only nucleotide composition and association information. Bioinformatics. 2017;33:1758–64. doi: 10.1093/bioinformatics/btx055. [DOI] [PMC free article] [PubMed] [Google Scholar]
96.Fawcett JA. An Introduction to ROC Analysis. Pattern Recognit Lett. 2005;27:861–74. [Google Scholar]
97.Ding H, Feng PM, Chen W. et al. Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis. Mol Biosyst. 2014;10:2229–35. doi: 10.1039/c4mb00316k. [DOI] [PubMed] [Google Scholar]
98.Chen W, Feng P, Tang H. et al. Identifying 2'-O-methylationation sites by integrating nucleotide chemical properties and nucleotide compositions. Genomics. 2016;107:255–258. doi: 10.1016/j.ygeno.2016.05.003. [DOI] [PubMed] [Google Scholar]
99.Xiao X, Wang P, Lin WZ. et al. iAMP-2L: A two-level multi-label classifier for identifying antimicrobial peptides and their functional types. Anal Biochem. 2013;436:168–77. doi: 10.1016/j.ab.2013.01.019. [DOI] [PubMed] [Google Scholar]
100.Chen W, Yang H, Feng PM. et al. iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties. Bioinformatics. 2017;33:3518–23. doi: 10.1093/bioinformatics/btx479. [DOI] [PubMed] [Google Scholar]
101.Lin H, Ding C, Yuan LF. et al. Predicting subchloroplast locations of proteins based on the general form of Chou's pseudo amino acid composition: approached from optimal tripeptide composition. Int J Biomath. 2013;6:1350003. [Google Scholar]
102.Jia J, Zhang L, Liu Z. et al. pSumo-CD: Predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC. Bioinformatics. 2016;32:3133–41. doi: 10.1093/bioinformatics/btw387. [DOI] [PubMed] [Google Scholar]
103.Chen W, Feng P, Tang H. et al. RAMPred: identifying the N-1-methyladenosine sites in eukaryotic transcriptomes. Sci Rep. 2016;6:31080. doi: 10.1038/srep31080. [DOI] [PMC free article] [PubMed] [Google Scholar]
104.Chen W, Tang H, Lin H. MethyRNA: a web server for identification of N6-methyladenosine sites. J Biomol Struct Dyn. 2017;35:683–7. doi: 10.1080/07391102.2016.1157761. [DOI] [PubMed] [Google Scholar]
105.Cheng X, Xiao X, Chou KC. pLoc-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general PseAAC. Genomics. 2017 doi: 10.1016/j.ygeno.2017.10.002. doi:10.1016/j.ygeno.2017.10.002. [DOI] [PubMed] [Google Scholar]
106.Lin H, Liu WX, He J. et al. Predicting cancerlectins by the optimal g-gap dipeptides. Sci Rep. 2015;5:16964. doi: 10.1038/srep16964. [DOI] [PMC free article] [PubMed] [Google Scholar]
107.Zhu PP, Li WC, Zhong ZJ. et al. Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition. Mol Biosyst. 2015;11:558–63. doi: 10.1039/c4mb00645c. [DOI] [PubMed] [Google Scholar]
108.Chen XX, Tang H, Li WC. et al. Identification of Bacterial Cell Wall Lyases via Pseudo Amino Acid Composition. BioMed Res Int. 2016;2016:1654623. doi: 10.1155/2016/1654623. [DOI] [PMC free article] [PubMed] [Google Scholar]
109.Zhao YW, Su ZD, Yang W. et al. IonchanPred 2.0: a tool to predict ion channels and their types. Int J Mol Sci. 2017;18:1838. doi: 10.3390/ijms18091838. [DOI] [PMC free article] [PubMed] [Google Scholar]
110.Cao R, Freitas C, Chan L. et al. ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network. Molecules. 2017;22:1732. doi: 10.3390/molecules22101732. [DOI] [PMC free article] [PubMed] [Google Scholar]
111.Liang ZY, Lai HY, Yang H. et al. Pro54DB: a database for experimentally verified sigma-54 promoters. Bioinformatics. 2017;33:467–9. doi: 10.1093/bioinformatics/btw630. [DOI] [PubMed] [Google Scholar]
112.Cao R, Adhikari B, Bhattacharya D. et al. QAcon: single model quality assessment using protein structural and contact information with machine learning techniques. Bioinformatics. 2017;33:586–8. doi: 10.1093/bioinformatics/btw694. [DOI] [PMC free article] [PubMed] [Google Scholar]
113.Zhang T, Tan P, Wang L. et al. RNALocate: a resource for RNA Subcellular Localizations. Nucleic Acids Res. 2017;45:D135–8. doi: 10.1093/nar/gkw728. [DOI] [PMC free article] [PubMed] [Google Scholar]
114.Cao R, Bhattacharya D, Hou J. et al. DeepQA: improving the estimation of single protein model quality with deep belief networks. BMC Bioinformatics. 2016;17:495. doi: 10.1186/s12859-016-1405-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
115.Li N, Kang J, Jiang L. et al. PSBinder: A Web Service for Predicting Polystyrene Surface-Binding Peptides. BioMed Res Int. 2017;2017:5761517. doi: 10.1155/2017/5761517. [DOI] [PMC free article] [PubMed] [Google Scholar]
116.He B, Chai G, Duan Y. et al. BDB: biopanning data bank. Nucleic Acids Res. 2016;44:D1127–32. doi: 10.1093/nar/gkv1100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B1] 1.Keeney S. Spo11 and the Formation of DNA Double-Strand Breaks in Meiosis. Genome Dyn Stab. 2008;2:81–123. doi: 10.1007/7050_2007_026. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] 2.Zenvirth D, Arbel T, Sherman A. et al. Multiple sites for double-strand breaks in whole meiotic chromosomes of Saccharomyces cerevisiae. EMBO J. 1992;11:3441–7. doi: 10.1002/j.1460-2075.1992.tb05423.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] 3.Gerton JL, DeRisi J, Shroff R. et al. Global mapping of meiotic recombination hotspots and coldspots in the yeast Saccharomyces cerevisiae. Proc Natl Acad Sci U S A. 2000;97:11383–90. doi: 10.1073/pnas.97.21.11383. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] 4.Marais G, Mouchiroud D, Duret L. Does recombination improve selection on codon usage? Lessons from nematode and fly complete genomes. Proc Natl Acad Sci U S A. 2001;98:5688–92. doi: 10.1073/pnas.091427698. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5] 5.Myers S, Bottolo L, Freeman C. et al. A fine-scale map of recombination rates and hotspots across the human genome. Science. 2005;310:321–4. doi: 10.1126/science.1117196. [DOI] [PubMed] [Google Scholar]

[B6] 6.Baudat F, Nicolas A. Clustering of meiotic double-strand breaks on yeast chromosome III. Proc Natl Acad Sci U S A. 1997;94:5213–8. doi: 10.1073/pnas.94.10.5213. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7.Klein S, Zenvirth D, Dror V. et al. Patterns of meiotic double-strand breakage on native and artificial yeast chromosomes. Chromosoma. 1996;105:276–84. doi: 10.1007/BF02524645. [DOI] [PubMed] [Google Scholar]

[B8] 8.Kohl KP, Sekelsky J. Meiotic and mitotic recombination in meiosis. Genetics. 2013;194:327–34. doi: 10.1534/genetics.113.150581. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] 9.Lichten M, Goldman AS. Meiotic recombination hotspots. Annu Rev Genet. 1995;29:423–44. doi: 10.1146/annurev.ge.29.120195.002231. [DOI] [PubMed] [Google Scholar]

[B10] 10.Ito M, Kugou K, Fawcett JA. et al. Meiotic recombination cold spots in chromosomal cohesion sites. Genes Cells. 2014;19:359–73. doi: 10.1111/gtc.12138. [DOI] [PubMed] [Google Scholar]

[B11] 11.Parvanov ED, Petkov PM, Paigen K. Prdm9 controls activation of mammalian recombination hotspots. Science. 2010;327:835. doi: 10.1126/science.1181495. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] 12.Zhang B, Liu G. Predicting recombination hotspots in yeast based on DNA sequence and chromatin structure. Curr Bioinfor. 2014;9:28–33. [Google Scholar]

[B13] 13.Jiang P, Wu H, Wei J. et al. RF-DYMHC: detecting the yeast meiotic recombination hotspots and coldspots by random forest model using gapped dinucleotide composition features. Nucleic Acids Res. 2007;35:W47–51. doi: 10.1093/nar/gkm217. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B14] 14.Zhou T, Weng J, Sun X. et al. Support vector machine for classification of meiotic recombination hotspots and coldspots in Saccharomyces cerevisiae based on codon composition. BMC Bioinfor. 2006;7:223. doi: 10.1186/1471-2105-7-223. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] 15.Liu G, Liu J, Cui X. et al. Sequence-dependent prediction of recombination hotspots in Saccharomyces cerevisiae. J Theor Biol. 2012;293:49–54. doi: 10.1016/j.jtbi.2011.10.004. [DOI] [PubMed] [Google Scholar]

[B16] 16.Chen W, Feng PM, Lin H. et al. iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res. 2013;41:e68. doi: 10.1093/nar/gks1450. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] 17.Li L, Yu S, Xiao W. et al. Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM. BMC Bioinfor. 2014;15:340. doi: 10.1186/1471-2105-15-340. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] 18.Qiu WR, Xiao X, Chou KC. iRSpot-TNCPseAAC: identify recombination spots with trinucleotide composition and pseudo amino acid components. Int J Mol Sci. 2014;15:1746–66. doi: 10.3390/ijms15021746. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19.Liu G, Xing Y, Cai L. Using weighted features to predict recombination hotspots in Saccharomyces cerevisiae. J Theor Biol. 2015;382:15–22. doi: 10.1016/j.jtbi.2015.06.030. [DOI] [PubMed] [Google Scholar]

[B20] 20.Liu B, Liu Y, Jin X. et al. iRSpot-DACC: a computational predictor for recombination hot/cold spots identification based on dinucleotide-based auto-cross covariance. Sci Rep. 2016;6:33483. doi: 10.1038/srep33483. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] 21.Dong C, Yuan YZ, Zhang FZ. et al. Combining pseudo dinucleotide composition with the Z curve method to improve the accuracy of predicting DNA elements: a case study in recombination spots. Mol Biosyst. 2016;12:2893–900. doi: 10.1039/c6mb00374e. [DOI] [PubMed] [Google Scholar]

[B22] 22.Liu B, Wang S, Long R. et al. iRSpot-EL: identify recombination spots with an ensemble learning approach. Bioinformatics. 2017;33:35–41. doi: 10.1093/bioinformatics/btw539. [DOI] [PubMed] [Google Scholar]

[B23] 23.Chou KC. Prediction of signal peptides using scaled window. Peptides. 2001;22:1973–9. doi: 10.1016/s0196-9781(01)00540-x. [DOI] [PubMed] [Google Scholar]

[B24] 24.Lin H, Deng EZ, Ding H. et al. iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Res. 2014;42:12961–72. doi: 10.1093/nar/gku1019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25] 25.Chou KC. Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol. 2011;273:236–47. doi: 10.1016/j.jtbi.2010.12.024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] 26.Meher PK, Sahu TK, Saini V. et al. Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou's general PseAAC. Sci Rep. 2017;7:42362. doi: 10.1038/srep42362. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B27] 27.Tripathi P, Pandey PN. A novel alignment-free method to classify protein folding types by combining spectral graph clustering with Chou's pseudo amino acid composition. J Theor Biol. 2017;424:49–54. doi: 10.1016/j.jtbi.2017.04.027. [DOI] [PubMed] [Google Scholar]

[B28] 28.Cheng X, Zhao SC, Lin WZ. et al. pLoc-mAnimal: predict subcellular localization of animal proteins with both single and multiple sites. Bioinformatics. 2017;33:3524–31. doi: 10.1093/bioinformatics/btx476. [DOI] [PubMed] [Google Scholar]

[B29] 29.Chen W, Feng P, Yang H. et al. iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences. Oncotarget. 2017;8:4208–17. doi: 10.18632/oncotarget.13758. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B30] 30.Lai HY, Chen XX, Chen W. et al. Sequence-based predictive modeling to identify cancerlectins. Oncotarget. 2017;8:28169–75. doi: 10.18632/oncotarget.15963. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B31] 31.Qiu WR, Sun BQ, Xiao X, iKcr-PseEns: Identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier. Genomics; 2017. p. 10. 1016/j.ygeno.201710.008. [DOI] [PubMed] [Google Scholar]

[B32] 32.Dao FY, Yang H, Su ZD. et al. Recent advances in conotoxin classification by using machine learning methods. Molecules. 2017;22:1057. doi: 10.3390/molecules22071057. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B33] 33.Feng P, Yang H, Ding H. et al. iDNA6mA-PseKNC: Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics. 2018 doi: 10.1016/j.ygeno.2018.01.005. doi:10.1016/j.ygeno.2018.01.005. [DOI] [PubMed] [Google Scholar]

[B34] 34.Zhao YW, Lai HY, Tang H. et al. Prediction of phosphothreonine sites in human proteins by fusing different features. Sci Rep. 2016;6:34817. doi: 10.1038/srep34817. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B35] 35.Yang H, Tang H, Chen XX. et al. Identification of secretory proteins in mycobacterium tuberculosis using pseudo amino acid composition. BioMed Res Int. 2016;2016:5413903. doi: 10.1155/2016/5413903. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B36] 36.Chou KC. Impacts of bioinformatics to medicinal chemistry. Med Chem. 2015;11:218–34. doi: 10.2174/1573406411666141229162834. [DOI] [PubMed] [Google Scholar]

[B37] 37.Chou kC. Prediction of protein cellular attributes using pseudo amino acid composition. Proteins. 2011;43:246–55. doi: 10.1002/prot.1035. [DOI] [PubMed] [Google Scholar]

[B38] 38.Zhong WZ, Zhou SF. Molecular science for drug development and biomedicine. Int J Mol Sci. 2014;15:20072–8. doi: 10.3390/ijms151120072. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B39] 39.Zhou GP, Zhong WZ. Perspectives in Medicinal Chemistry. Curr Top Med Chem. 2016;16:381–2. doi: 10.2174/156802661604151014114030. [DOI] [PubMed] [Google Scholar]

[B40] 40.Esmaeili M, Mohabatkar H, Mohsenzadeh S. Using the concept of Chou's pseudo amino acid composition for risk type prediction of human papillomaviruses. J Theor Biol. 2010;263:203–209. doi: 10.1016/j.jtbi.2009.11.016. [DOI] [PubMed] [Google Scholar]

[B41] 41.Mohammad Beigi M, Behjati M, Mohabatkar H. Prediction of metalloproteinase family based on the concept of Chou's pseudo amino acid composition using a machine learning approach. J Struct Funct Genomics. 2011;12:191–7. doi: 10.1007/s10969-011-9120-4. [DOI] [PubMed] [Google Scholar]

[B42] 42.Tang H, Su ZD, Wei HH. et al. Prediction of cell-penetrating peptides with feature selection techniques. Biochem Biophys Res Commun. 2016;477:150–4. doi: 10.1016/j.bbrc.2016.06.035. [DOI] [PubMed] [Google Scholar]

[B43] 43.Pacharawongsakda E, Theeramunkong T. Predict Subcellular Locations of Singleplex and Multiplex Proteins by Semi-Supervised Learning and Dimension-Reducing General Mode of Chou's PseAAC. IEEE Trans Nanobioscience. 2013;12:311–20. doi: 10.1109/TNB.2013.2272014. [DOI] [PubMed] [Google Scholar]

[B44] 44.Nanni L, Brahnam S, Lumini A. Prediction of protein structure classes by incorporating different protein descriptors into general Chou's pseudo amino acid composition. J Theor Biol. 2014;360:109–16. doi: 10.1016/j.jtbi.2014.07.003. [DOI] [PubMed] [Google Scholar]

[B45] 45.Sharma R, Dehzangi A, Lyons J. et al. Predict Gram-Positive and Gram-Negative Subcellular Localization via Incorporating Evolutionary Information and Physicochemical Features Into Chou's General PseAAC. IEEE Trans Nanobioscience. 2015;14:915–26. doi: 10.1109/TNB.2015.2500186. [DOI] [PubMed] [Google Scholar]

[B46] 46.Ding H, Liu L, Guo FB. et al. Identify Golgi protein types with modified Mahalanobis discriminant algorithm and pseudo amino acid composition. Protein Pept Lett. 2011;18:58–63. doi: 10.2174/092986611794328708. [DOI] [PubMed] [Google Scholar]

[B47] 47.Yu B, Li S, Qiu WY. et al. Accurate prediction of subcellular location of apoptosis proteins combining Chou's PseAAC and PsePSSM based on wavelet denoising. Oncotarget. 2017;8:107640–65. doi: 10.18632/oncotarget.22585. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B48] 48.Zhang S, Duan X. Prediction of protein subcellular localization with oversampling approach and Chou's general PseAAC. J Theor Biol. 2018;437:239–50. doi: 10.1016/j.jtbi.2017.10.030. [DOI] [PubMed] [Google Scholar]

[B49] 49.Chou KC. An unprecedented revolution in medicinal chemistry driven by the progress of biological science. Curr Top Med Chem. 2017;17:2337–58. doi: 10.2174/1568026617666170414145508. [DOI] [PubMed] [Google Scholar]

[B50] 50.Liu B, Yang F, Chou KC. 2L-piRNA: A two-layer ensemble classifier for identifying piwi-interacting RNAs and their function. Mol Ther - Nucleic Acids. 2017;7:267–77. doi: 10.1016/j.omtn.2017.04.008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B51] 51.Chen W, Lei TY, Jin DC. et al. PseKNC: a flexible web-server for generating pseudo K-tuple nucleotide composition. Anal Biochem. 2014;456:53–60. doi: 10.1016/j.ab.2014.04.001. [DOI] [PubMed] [Google Scholar]

[B52] 52.Chen W, Zhang X, Brooker J. et al. PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions. Bioinformatics. 2015;31:119–20. doi: 10.1093/bioinformatics/btu602. [DOI] [PubMed] [Google Scholar]

[B53] 53.Chen W, Lin H, Chou KC. Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. Mol BioSyst. 2015;11:2620–34. doi: 10.1039/c5mb00155b. [DOI] [PubMed] [Google Scholar]

[B54] 54.Ghandi M, Mohammad-Noori M, Beer MA. Robust k-mer frequency estimation using gapped k-mers. J Math Biol. 2014;69:469–500. doi: 10.1007/s00285-013-0705-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B55] 55.Hua ZG, Lin Y, Yuan YZ. et al. ZCURVE 3.0: identify prokaryotic genes with higher accuracy as well as automatically and accurately select essential genes. Nucleic Acids Res. 2015;43:W85–90. doi: 10.1093/nar/gkv491. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B56] 56.Li WC, Deng EZ, Ding H. et al. iORI-PseKNC: a predictor for identifying origin of replication with pseudo k-tuple nucleotide composition. Chemom Intell Lab Syst. 2015;141:100–6. [Google Scholar]

[B57] 57.Lin H, Liang ZY, Tang H. et al. Identifying sigma70 promoters with novel pseudo nucleotide composition. IEEE/ACM Trans Comput Biol Bioinform. 2017 doi: 10.1109/TCBB.2017.2666141. doi: 10.1109/TCBB.2017.2666141. [DOI] [PubMed] [Google Scholar]

[B58] 58.Guo SH, Deng EZ, Xu LQ. et al. iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics. 2014;30:1522–9. doi: 10.1093/bioinformatics/btu083. [DOI] [PubMed] [Google Scholar]

[B59] 59.Li WC, Zhong ZJ, Zhu PP. et al. Sequence analysis of origins of replication in the Saccharomyces cerevisiae genomes. Front Microbiol. 2014;5:574. doi: 10.3389/fmicb.2014.00574. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B60] 60.Hsieh LC, Luo L, Ji F. et al. Minimal model for genome evolution and growth. Phys Rev Lett. 2003;90:018101. doi: 10.1103/PhysRevLett.90.018101. [DOI] [PubMed] [Google Scholar]

[B61] 61.Lin H, Li QZ. Eukaryotic and prokaryotic promoter prediction using hybrid approach. Theory Biosci. 2011;130:91–100. doi: 10.1007/s12064-010-0114-8. [DOI] [PubMed] [Google Scholar]

[B62] 62.Feng PM, Chen W, Lin H. et al. iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal Biochem. 2013;442:118–25. doi: 10.1016/j.ab.2013.05.024. [DOI] [PubMed] [Google Scholar]

[B63] 63.Ding C, Yuan LF, Guo SH. et al. Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions. J Proteomics. 2012;77:321–28. doi: 10.1016/j.jprot.2012.09.006. [DOI] [PubMed] [Google Scholar]

[B64] 64.Lin H, Ding H, Guo FB. et al. Prediction of subcellular location of mycobacterial protein using feature selection techniques. Mol Divers. 2010;14:667–71. doi: 10.1007/s11030-009-9205-1. [DOI] [PubMed] [Google Scholar]

[B65] 65.Wu Y, Tang H, Chen W. et al. Predicting human enzyme family classes by using pseudo amino acid composition. Curr Proteomics. 2016;13:99–104. [Google Scholar]

[B66] 66.Ma J, Gu H. A novel method for predicting protein subcellular localization based on pseudo amino acid composition. BMB Rep. 2010;43:670–6. doi: 10.5483/BMBRep.2010.43.10.670. [DOI] [PubMed] [Google Scholar]

[B67] 67.Olivier I, Loots du T. A metabolomics approach to characterise and identify various Mycobacterium species. J Microbiol Methods. 2012;88:419–26. doi: 10.1016/j.mimet.2012.01.012. [DOI] [PubMed] [Google Scholar]

[B68] 68.Du QS, Wang SQ, Xie NZ. et al. 2L-PCA: A two-level principal component analyzer for quantitative drug design and its applications. Oncotarget. 2017;8:70564–78. doi: 10.18632/oncotarget.19757. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B69] 69.Lin H, Ding H. Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition. J Theor Biol. 2011;269:64–9. doi: 10.1016/j.jtbi.2010.10.019. [DOI] [PubMed] [Google Scholar]

[B70] 70.Tang H, Chen W, Lin H. Identification of immunoglobulins using Chou's pseudo amino acid composition with feature selection technique. Mol BioSyst. 2016;12:1269–75. doi: 10.1039/c5mb00883b. [DOI] [PubMed] [Google Scholar]

[B71] 71.Zhang X, Lu X, Shi Q. et al. Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics. 2006;7:197. doi: 10.1186/1471-2105-7-197. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B72] 72.Qureshi MN, Min B, Jo HJ. et al. Multiclass Classification for the Differential Diagnosis on the ADHD Subtypes Using Recursive Feature Elimination and Hierarchical Extreme Learning Machine: Structural MRI Study. PLoS One. 2016;11:e0160697. doi: 10.1371/journal.pone.0160697. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B73] 73.Wang T, Xia T, Hu XM. Geometry preserving projections algorithm for predicting membrane protein types. J Theor Biol. 2010;262:208–13. doi: 10.1016/j.jtbi.2009.09.027. [DOI] [PubMed] [Google Scholar]

[B74] 74.Vapnik VN. An overview of statistical learning theory. IEEE Trans Neural Netw. 1999;10:988–99. doi: 10.1109/72.788640. [DOI] [PubMed] [Google Scholar]

[B75] 75.Chou KC, Cai YD. Using functional domain composition and support vector machines for prediction of protein subcellular location. J Biol Chem. 2002;277:45765–9. doi: 10.1074/jbc.M204161200. [DOI] [PubMed] [Google Scholar]

[B76] 76.Qiu WR, Sun BQ, Tang H. et al. Identify and analysis crotonylation sites in histone by using support vector machines. Artif Intell Med. 2017;83:75–81. doi: 10.1016/j.artmed.2017.02.007. [DOI] [PubMed] [Google Scholar]

[B77] 77.Cristianini N, Shawe-Taylor J. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press; 2000. Chapter 3. [Google Scholar]

[B78] 78.Chang CC, Lin CJ, Training nu-support vector classifiers. theory and algorithms. Neural Comput. 2001;13:2119–47. doi: 10.1162/089976601750399335. [DOI] [PubMed] [Google Scholar]

[B79] 79.Chen J, Liu H, Yang J. Prediction of linear B-cell epitopes using amino acid pair antigenicity scale. Amino Acids. 2007;33:423–8. doi: 10.1007/s00726-006-0485-9. [DOI] [PubMed] [Google Scholar]

[B80] 80.Xu Y, Ding J, Wu LY. et al. iSNO-PseAAC: Predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition. PLoS ONE. 2013;8:e55844. doi: 10.1371/journal.pone.0055844. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B81] 81.Xu Y, Shao XJ, Wu LY. et al. iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins. PeerJ. 2013;1:e171. doi: 10.7717/peerj.171. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B82] 82.Chen E, Feng PM, Deng EZ. et al. iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. Anal Biochem. 2014;462:76–83. doi: 10.1016/j.ab.2014.06.022. [DOI] [PubMed] [Google Scholar]

[B83] 83.Ding H, Li D. Identification of mitochondrial proteins of malaria parasite using analysis of variance. Amino Acids. 2015;47:329–33. doi: 10.1007/s00726-014-1862-4. [DOI] [PubMed] [Google Scholar]

[B84] 84.Tang H, Zou P, Zhang C. et al. Identification of apolipoprotein using feature selection technique. Sci Rep. 2016;6:30441. doi: 10.1038/srep30441. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B85] 85.Ding H, Deng EZ, Yuan LF. et al. iCTX-Type: A sequence-based predictor for identifying the types of conotoxins in targeting ion channels. BioMed Res Int. 2014;2014:286419. doi: 10.1155/2014/286419. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B86] 86.Chen W, Feng PM, Lin H. et al. iSS-PseDNC: identifying splicing sites using pseudo dinucleotide composition. BioMed Res Int. 2014;2014:623149. doi: 10.1155/2014/623149. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B87] 87.Xu Y, Li C. Chou KC. iPreny-PseAAC: identify C-terminal cysteine prenylation sites in proteins by incorporating two tiers of sequence couplings into PseAAC. Med Chem. 2017;13:544–51. doi: 10.2174/1573406413666170419150052. [DOI] [PubMed] [Google Scholar]

[B88] 88.Feng PM, Lin H, Chen W. Identification of antioxidants from sequence information using Naive Bayes. Comput Math Method Med. 2013;2013:567529. doi: 10.1155/2013/567529. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B89] 89.Feng P, Ding H, Yang H. et al. iRNA-PseColl: Identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into PseKNC. Mol Ther - Nucleic Acids. 2017;7:155–63. doi: 10.1016/j.omtn.2017.03.006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B90] 90.Feng PM, Ding H, Chen W. et al. Naive Bayes classifier with feature selection to identify phage virion proteins. Comput Math Method Med. 2013;2013:530696. doi: 10.1155/2013/530696. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B91] 91.Chou KC, Zhang CT. Prediction of protein structural classes. Crit Rev Biochem Mol Biol. 1995;30:275–349. doi: 10.3109/10409239509083488. [DOI] [PubMed] [Google Scholar]

[B92] 92.Rahimi M, Bakhtiarizadeh MR, Mohammadi-Sangcheshmeh A. OOgenesis_Pred: A sequence-based method for predicting oogenesis proteins by six different modes of Chou's pseudo amino acid composition. J Theor Biol. 2017;414:128–36. doi: 10.1016/j.jtbi.2016.11.028. [DOI] [PubMed] [Google Scholar]

[B93] 93.Khan M, Hayat M, Khan SA. et al. Unb-DPC: Identify mycobacterial membrane protein types by incorporating un-biased dipeptide composition into Chou's general PseAAC. J Theor Biol. 2017;415:13–9. doi: 10.1016/j.jtbi.2016.12.004. [DOI] [PubMed] [Google Scholar]

[B94] 94.Tahir M, Hayat M, Kabir M. Sequence based predictor for discrimination of enhancer and their types by applying general form of Chou's trinucleotide composition. Comput Methods Programs Biomed. 2017;146:69–75. doi: 10.1016/j.cmpb.2017.05.008. [DOI] [PubMed] [Google Scholar]

[B95] 95.Guo FB, Dong C, Hua HL. et al. Accurate prediction of human essential genes using only nucleotide composition and association information. Bioinformatics. 2017;33:1758–64. doi: 10.1093/bioinformatics/btx055. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B96] 96.Fawcett JA. An Introduction to ROC Analysis. Pattern Recognit Lett. 2005;27:861–74. [Google Scholar]

[B97] 97.Ding H, Feng PM, Chen W. et al. Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis. Mol Biosyst. 2014;10:2229–35. doi: 10.1039/c4mb00316k. [DOI] [PubMed] [Google Scholar]

[B98] 98.Chen W, Feng P, Tang H. et al. Identifying 2'-O-methylationation sites by integrating nucleotide chemical properties and nucleotide compositions. Genomics. 2016;107:255–258. doi: 10.1016/j.ygeno.2016.05.003. [DOI] [PubMed] [Google Scholar]

[B99] 99.Xiao X, Wang P, Lin WZ. et al. iAMP-2L: A two-level multi-label classifier for identifying antimicrobial peptides and their functional types. Anal Biochem. 2013;436:168–77. doi: 10.1016/j.ab.2013.01.019. [DOI] [PubMed] [Google Scholar]

[B100] 100.Chen W, Yang H, Feng PM. et al. iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties. Bioinformatics. 2017;33:3518–23. doi: 10.1093/bioinformatics/btx479. [DOI] [PubMed] [Google Scholar]

[B101] 101.Lin H, Ding C, Yuan LF. et al. Predicting subchloroplast locations of proteins based on the general form of Chou's pseudo amino acid composition: approached from optimal tripeptide composition. Int J Biomath. 2013;6:1350003. [Google Scholar]

[B102] 102.Jia J, Zhang L, Liu Z. et al. pSumo-CD: Predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC. Bioinformatics. 2016;32:3133–41. doi: 10.1093/bioinformatics/btw387. [DOI] [PubMed] [Google Scholar]

[B103] 103.Chen W, Feng P, Tang H. et al. RAMPred: identifying the N-1-methyladenosine sites in eukaryotic transcriptomes. Sci Rep. 2016;6:31080. doi: 10.1038/srep31080. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B104] 104.Chen W, Tang H, Lin H. MethyRNA: a web server for identification of N6-methyladenosine sites. J Biomol Struct Dyn. 2017;35:683–7. doi: 10.1080/07391102.2016.1157761. [DOI] [PubMed] [Google Scholar]

[B105] 105.Cheng X, Xiao X, Chou KC. pLoc-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general PseAAC. Genomics. 2017 doi: 10.1016/j.ygeno.2017.10.002. doi:10.1016/j.ygeno.2017.10.002. [DOI] [PubMed] [Google Scholar]

[B106] 106.Lin H, Liu WX, He J. et al. Predicting cancerlectins by the optimal g-gap dipeptides. Sci Rep. 2015;5:16964. doi: 10.1038/srep16964. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B107] 107.Zhu PP, Li WC, Zhong ZJ. et al. Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition. Mol Biosyst. 2015;11:558–63. doi: 10.1039/c4mb00645c. [DOI] [PubMed] [Google Scholar]

[B108] 108.Chen XX, Tang H, Li WC. et al. Identification of Bacterial Cell Wall Lyases via Pseudo Amino Acid Composition. BioMed Res Int. 2016;2016:1654623. doi: 10.1155/2016/1654623. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B109] 109.Zhao YW, Su ZD, Yang W. et al. IonchanPred 2.0: a tool to predict ion channels and their types. Int J Mol Sci. 2017;18:1838. doi: 10.3390/ijms18091838. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B110] 110.Cao R, Freitas C, Chan L. et al. ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network. Molecules. 2017;22:1732. doi: 10.3390/molecules22101732. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B111] 111.Liang ZY, Lai HY, Yang H. et al. Pro54DB: a database for experimentally verified sigma-54 promoters. Bioinformatics. 2017;33:467–9. doi: 10.1093/bioinformatics/btw630. [DOI] [PubMed] [Google Scholar]

[B112] 112.Cao R, Adhikari B, Bhattacharya D. et al. QAcon: single model quality assessment using protein structural and contact information with machine learning techniques. Bioinformatics. 2017;33:586–8. doi: 10.1093/bioinformatics/btw694. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B113] 113.Zhang T, Tan P, Wang L. et al. RNALocate: a resource for RNA Subcellular Localizations. Nucleic Acids Res. 2017;45:D135–8. doi: 10.1093/nar/gkw728. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B114] 114.Cao R, Bhattacharya D, Hou J. et al. DeepQA: improving the estimation of single protein model quality with deep belief networks. BMC Bioinformatics. 2016;17:495. doi: 10.1186/s12859-016-1405-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B115] 115.Li N, Kang J, Jiang L. et al. PSBinder: A Web Service for Predicting Polystyrene Surface-Binding Peptides. BioMed Res Int. 2017;2017:5761517. doi: 10.1155/2017/5761517. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B116] 116.He B, Chai G, Duan Y. et al. BDB: biopanning data bank. Nucleic Acids Res. 2016;44:D1127–32. doi: 10.1093/nar/gkv1100. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

iRSpot-Pse6NC: Identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC

Hui Yang

Wang-Ren Qiu

Guoqing Liu

Feng-Biao Guo

Wei Chen

Kuo-Chen Chou

Hao Lin

Abstract

Introduction

Figure 1.

Materials and Methods

Benchmark dataset: hot/cold spots DNA sequences

Figure 2.

Hexamer composition and its PseKNC vector

The rule for ranking features

Support vector machine

Results and Discussion

Cross-validation

Comparison with existing methods

Table 1.

Feature analysis

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Web-server and user guide

Figure 7.

Acknowledgments

Author Contributions

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

iRSpot-Pse6NC: Identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC

Hui Yang

Wang-Ren Qiu

Guoqing Liu

Feng-Biao Guo

Wei Chen

Kuo-Chen Chou

Hao Lin

Abstract

Introduction

Figure 1.

Materials and Methods

Benchmark dataset: hot/cold spots DNA sequences

Figure 2.

Hexamer composition and its PseKNC vector

The rule for ranking features

Support vector machine

Results and Discussion

Cross-validation

Comparison with existing methods

Table 1.

Feature analysis

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Web-server and user guide

Figure 7.

Acknowledgments

Author Contributions

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases