Skip to main content
Molecular Therapy. Nucleic Acids logoLink to Molecular Therapy. Nucleic Acids
. 2017 Mar 29;7:155–163. doi: 10.1016/j.omtn.2017.03.006

iRNA-PseColl: Identifying the Occurrence Sites of Different RNA Modifications by Incorporating Collective Effects of Nucleotides into PseKNC

Pengmian Feng 1, Hui Ding 2, Hui Yang 2, Wei Chen 3,4,, Hao Lin 2,4,∗∗, Kuo-Chen Chou 2,4,∗∗∗
PMCID: PMC5415964  PMID: 28624191

Abstract

There are many different types of RNA modifications, which are essential for numerous biological processes. Knowledge about the occurrence sites of RNA modifications in its sequence is a key for in-depth understanding of their biological functions and mechanism. Unfortunately, it is both time-consuming and laborious to determine these sites purely by experiments alone. Although some computational methods were developed in this regard, each one could only be used to deal with some type of modification individually. To our knowledge, no method has thus far been developed that can identify the occurrence sites for several different types of RNA modifications with one seamless package or platform. To address such a challenge, a novel platform called “iRNA-PseColl” has been developed. It was formed by incorporating both the individual and collective features of the sequence elements into the general pseudo K-tuple nucleotide composition (PseKNC) of RNA via the chemicophysical properties and density distribution of its constituent nucleotides. Rigorous cross-validations have indicated that the anticipated success rates achieved by the proposed platform are quite high. To maximize the convenience for most experimental biologists, the platform’s web-server has been provided at http://lin.uestc.edu.cn/server/iRNA-PseColl along with a step-by-step user guide that will allow users to easily achieve their desired results without the need to go through the mathematical details involved in this paper.

Keywords: RNA modification, nucleotide chemicophysical property, collective effect, PseKNC, SVM

Introduction

Since the first modified RNA ribonucleic acid was found ∼60 years ago,1 ∼150 known RNA modifications have been reported.2 Emerging evidences suggest that RNA modifications are critical components of the gene regulatory landscape and are involved in a variety of biological processes in the post-transcriptional level, such as protein translation and localization,3 mRNA splicing,4 affecting ribosome biogenesis,5 mediating antibiotic resistance,6 and stem cell pluripotency.7 However, many aspects of RNA modifications remain unknown.8 Therefore, detecting the positions of RNA modifications plays an essential role for understanding their molecular mechanisms and functions.

The advent of next-generation sequencing technologies has allowed investigation of RNA modifications on a genome-wide scale.9, 10, 11, 12, 13, 14, 15 For example, the N1-methyladenosine (m1A),9, 10 N6-methyladenosine (m6A),13 and 5-methylcytosine (m5C)15 maps are available for the human transcriptome. Although these experimental methods played active roles in promoting the research progress on understanding the biological functions and the identification of RNA modifications, they are still labor-intensive. As excellent complements to experimental techniques, some computational methods (based on the high-resolution experimental data) have been developed to identify RNA modifications.7, 16, 17, 18, 19, 20, 21

Reminiscent of the regulation of gene expression by histone modifications, it is also possible to mediate biological functions in a collective way by combining different kinds of RNA modifications.8 Unfortunately, to the best of our knowledge, no computational tool is available for dealing with a system that simultaneously contains several different kinds of RNA modifications. Actually, this kind of multi-modification systems may contain much more interesting things worthy of exploration.

In view of this, the present study was initiated in an attempt to fill such a void by establishing a seamless package or platform that can be used to analyze a biological system that simultaneously contains the three well known types of RNA modifications: m1A, m6A, and m5C (Figure 1).

Figure 1.

Figure 1

A Schematic Drawing to Show the Three Types of Modifications that May Simultaneously Occur in an RNA Sequence

Three types of modifications (m1A, m6A, and m5C) are shown.

Results and Discussion

By incorporating collective effects of nucleotides into PseKNC,22, 23 a seamless platform called “iRNA-PseColl” has been developed for identifying the occurrence sites of different RNA modifications.

It has been observed by the most rigorous cross-validation, the jackknife test,24 that the success rates achieved by the new predictor are quite high for the three different types of RNA modification sites, respectively (Table 1).

Table 1.

The Success Rates Obtained by the Proposed Model in Identifying Three Different Types of RNA Modification Sites

Modification Type Metricsa
Sn (%) Sp (%) Acc (%) MCC
(1) m1A 98.38 99.89 99.13 0.98
(2) m6A 81.86 99.11 90.38 0.82
(3) m5C 75.83 79.17 77.50 0.55

The results were obtained by the jackknife tests on the three benchmark datasets given in Supplemental Materials and Methods, respectively. Acc, overall accuracy; MCC, Mathew’s correlation coefficient; Sn, sensitivity; Sp, specificity.

a

See Equation 13 and the relevant text for the definition of metrics.

Because it is the first platform predictor ever developed for simultaneously identifying three different types of RNA modification sites based on its sequence information alone, it is not possible to demonstrate its power by a comparison with its counterparts because there is no such a counterpart yet for exactly the same purpose. Nevertheless, as we can see from Table 1, all the scores are quite high, particularly for the overall accuracy (Acc) and Mathew’s correlation coefficient (MCC).

Let us use graphic analysis to further demonstrate the proposed platform’s quality. As it is, the graphical approach is a useful vehicle for studying complicated biological systems because it can provide intuitive insights, as demonstrated by a series of previous studies.25, 26, 27, 28, 29, 30, 31, 32, 33, 34 Therefore, it would be instructive and illuminative to give an intuitive illustration for the current study as well. To realize this, the graph of receiver operating characteristic (ROC)35, 36 was adopted as shown in Figure 2, where the ROC curves for the current method in identifying m1A, m6A, and m5C modifications were given, respectively. The best possible prediction method would yield a point with the coordinate (0, 1) representing 100% sensitivity and 0 false-positive rate or 100% specificity. Therefore, the (0, 1) point is also called a perfect classification. A completely random guess would give a point along a diagonal from the point (0, 0) to (1, 1). The area under the ROC curve, also called AUROC, is used to indicate the performance quality of the classifier: the value 0.5 of AUROC is equivalent to random prediction while 1 of AUROC represents a perfect one. The AUROC for the case of m1A, m6A, or m5C is 0.998, 0.849, or 0.911, respectively, indicating that the proposed platform is quite promising, holding very high potential to become a useful high throughput tool for genome analyses.

Figure 2.

Figure 2

A Graphical Illustration to Show the Performances of iRNA-PseColl in Identifying m1A, m6A, and m5C Modification Sites, Respectively

The performances are illustrated by means of the ROC curves.35, 36 The area under the ROC curve is called AUROC. The greater the AUROC value is, the better the performance will be. See the text for further explanation.

Inspired by a series of recent publications,20, 21, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53 papers published with a publicly accessible web server will significantly enhance their impacts; this is particularly true for those papers aimed at developing novel prediction methods.54 Accordingly, the web server for the current platform has been established. Moreover, for the convenience of the scientific community, a user guide is given in the Supplemental Materials and Methods.

Materials and Methods

According to the Chou’s55 five-step guidelines that have been followed by many investigators in a series of recent publications,21, 39, 41, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 56, 57, 58, 59, 60, 61, 62, 63 to develop a new prediction method that not only can be easily used by most experimental scientists but also can inspire theoretical scientists to develop many other relevant prediction methods, we should make the following five procedures very clear: (1) how to construct or select a valid benchmark dataset to train and test the prediction model, (2) how to represent a biological sequence sample with a mathematical formulation or vector that is really correlated with the target concerned, (3) how to introduce or develop a powerful engine (or algorithm) to run the prediction model, (4) how to properly perform the cross-validation tests to objectively evaluate the anticipated accuracy, and (5) how to design a user-friendly web server to make it easy for people to get their desired results. Below, we elaborate the five procedures in establishing the new predictor.

Benchmark Dataset

Owing to the fast development of high-throughput experimental techniques, the experimentally confirmed m1A, m6A, and m5C modification data is available for the human genome.9, 10, 13, 15 By mapping the experimental data to the human genome, the sequence samples with statistical significance were obtained for the three kinds of RNA modification sites as well. For facilitating the formulation, let us use the following scheme to represent a potential RNA modification-site-containing sample

Rξ()=NξN(ξ1)N2N1N+1N+2N+(ξ1)N+ξ, (Equation 1)

where the symbol denotes the single nucleic acid code A (adenine) or C (cytosine), the subscript ξ is an integer, Nξ represents the ξ -th upstream nucleotide from the center, the N+ξ represents the ξ -th downstream nucleotide, and so forth. The (2ξ+1) -tuple RNA sample, Rξ(), can be further classified into the following two categories:

Rξ(){Rξ+(),ifitscentercanbeof2-O-methylationRξ(),otherwise, (Equation 2)

where Rξ+() denotes a true modification segment with A or C at its center, Rξ() denotes a false modification segment with A or C at its center, and the symbol means “a member of” in the set theory.

In literature, the benchmark dataset usually consists of a training dataset and a testing dataset: the former is for the use of training a model, while the latter for testing the model. However, as elucidated in a comprehensive review,64 there is no need to artificially separate a benchmark dataset into the aforementioned two parts if the prediction model is examined by the jackknife test or subsampling (K-fold) cross-validation, because the outcome thus obtained is actually from a combination of many different independent dataset tests. Thus, the benchmark datasets for the current study can be further formulated as

{Sξ(m1A)=Sξ+(m1A)Sξ(m1A),when=ASξ(m6A)=Sξ+(m6A)Sξ(m6A),when=ASξ(m5C)=Sξ+(m5C)Sξ(m5C),when=C, (Equation 3)

where the positive subset Sξ+(m1A) only contains those RNA samples that can have m1A modification, and the negative subset Sξ(m1A) only contains those RNA samples that cannot have m1A modification, while U denotes the symbol of “union” in the set theory,64 and so forth.

The benchmark datasets were derived from the RNA sequences in human genome that have the experimentally confirmed m1A, m6A, and m5C modification sites.9, 10, 13, 15 The detailed procedures to construct the benchmark dataset are as follows. First, as done in Chou,65 by sliding the (2ξ+1) -tuple nucleotide window (Figure 3) along each of the aforementioned RNA sequences, only those RNA segments with = A or C at the center were collected. Second, if the upstream or downstream in a RNA sequence was less than ξ or greater than Lξ where L is the length of the RNA sequence concerned, the lacking code was filled with the same code of its nearest neighbor. Third, the RNA segment samples thus obtained were put into the positive subset Sξ+(m1A), Sξ+(m6A), or Sξ+(m5C) if their centers were experimentally annotated as the m1A, m6A, or m5C sites; otherwise, into the corresponding negative subset Sξ(m1A), Sξ(m6A), or Sξ(m5C). Fourth, to reduce redundancy and bias, none of the included RNA segments had pairwise sequence identity with any other in a same subset. By strictly following the above procedures, we obtained an array of benchmark datasets with different ξ values and hence different lengths of RNA samples (2ξ+1) as well (see Equation 1), as illustrated below

Sξ(){23nucleotides,whenξ=1125nucleotides,whenξ=1227nucleotides,whenξ=1339nucleotides,whenξ=1941nucleotides,whenξ=2043nucleotides,whenξ=21, (Equation 4)

where the symbol means “formed by.” It was observed via preliminary tests as well as many reports19, 43, 66 that when ξ=20 (i.e., the RNA samples formed by 41 nucleotides [nt]), the corresponding results were most promising. Accordingly, hereafter we only consider the 41-nt RNA sequences. By doing so, we obtained 6,366, 1,130 and 120 sequence samples for the positive subsets Sξ+(m1A), Sξ+(m6A), and Sξ+(m5C), respectively. The numbers of samples thus obtained in the corresponding negative subsets are much greater, and hence the benchmark datasets would be very imbalanced. Using such highly skewed benchmark dataset to train predictors would lead to the outcome that many positive cases might be mispredicted as negative ones.42, 44, 56, 67 To balance out the size between the positive subset and the negative subset, we randomly picked out 6,366, 1,130, and 120 from the corresponding negative samples to for the negative subsets Sξ(m1A), Sξ(m6A), and Sξ(m5C), respectively, as done in Chen et al.16 and Feng et al.19

Figure 3.

Figure 3

An Illustration to Show the Process of Collecting the RNA Samples by Sliding the (2ξ + 1) -nt Scaled Window along an RNA Sequence

Adapted from Chou65 with permission. See the text for further explanation.

Finally, the detailed RNA sequence samples thus obtained for the benchmark dataset Sξ=20(m1A), Sξ=20(m6A), and Sξ=20(m5C) are given in the Supplemental Materials and Methods, which can also be directly downloaded from http://lin.uestc.edu.cn/server/iRNA-PseColl/dataset.htm.

Formulating RNA Sequence Samples

One of the most challenging problems in computational biology today is how to formulate a biological sequence with a vector that can reflect its key pattern important for the function or mechanism concerned. The importance of such a challenge is due to the fact that nearly all the existing machine-learning algorithms were developed to handle vector rather than sequence samples, as elucidated in a review article.54 Unfortunately, a vector defined in a discrete model may lose many important sequence pattern features. To deal with such a problem for protein/peptide sequences, the pseudo amino acid composition (PseAAC)68, 69, 70, 71, 72 was developed. Ever since it was introduced, the concept of PseAAC has penetrated into nearly all the areas of computational proteomics (see a long list of references cited in two review papers55, 73). Inspired by the concept of PseAAC and encouraged by its great successes, the pseudo nucleotide composition (PseKNC)22, 74, 75, 76 was proposed and has been increasingly used in various fields of genome analysis.20, 21, 23, 37, 39, 40, 42, 43, 51, 52, 53, 58, 59, 60, 77, 78, 79, 80, 81, 82, 83, 84, 85 With both PseAAC and PseKNC being increasingly and widely used, it is highly desired to design a seamless package that can generate various modes of PseAAC and PseKNC according to users’ needs for protein/peptide and DNA/RNA sequences, respectively. This was exactly the driving force of establishing the web server called Pse-in-One86 and what it is about.

The general form of PseKNC for an RNA sequence sample is given by23

R=[ϕ1ϕ2ϕuϕΓ]T, (Equation 5)

where T is a transpose operator, while the subscript Γ an integer and its value as well as the components ϕu (u=1,2,,Γ) will depend on how to extract the desired features from the RNA sequence sample. In order to make Equation 4 able to reflect both the local feature of its individual constituent nucleotides and that of their collective effect, let us define the components in Equation 4 from the following two different approaches.

Local Features of Individual Nucleotides

RNA consists of four types of nucleotides: A (adenosine), C (cytidine), G (guanosine), and U (uridine). They can be classified into three different categories (Table 1): (1) from the angle of ring number, A and G have two rings, whereas C and U only one; (2) from the chemical functionality, A and C belong to amino group, while G and U to keto group; and (3) from the angle of hydrogen bonding, C and G can be bonded to each other with three hydrogen bonds, but A and U with only two (Figure 4). All these properties would have different impacts to RNA’s low-frequency internal motion87, 88 and its biological function89, 90, 91.

Figure 4.

Figure 4

Illustration to Show the Structure of Paired Nucleic Acid Residues

Left: A-U pair bonded to each other with two hydrogen bonds. Right: G-C pair with three hydrogen bonds. Adapted from Chou87 with permission.

To reflect the aforementioned features, let us denote the i-th nucleotide of Equation 1 by92, 93

Ni=(xi,yi,zi), (Equation 6)

where xi, yi, and zi refer to the attributes of (1) ring structure, (2) functional group, and (3) hydrogen bonding in Table 2, respectively. Accordingly, the nucleotide A can be formulated as (1, 1, 1), C as (0, 1, 0), G as (1, 0, 0), and U as (0, 0, 1); or generally we have

xi={1,ifNi{A,G}0,ifNi{C,U};yi={1,ifNi{A,C}0,ifNi{G,U};zi={1,ifNi{A,U}0,ifNi{C,G}. (Equation 7)

Table 2.

Classification of Nucleotides

Angle of View Attribute Nucleotides
(1) Ring structure purine A, G
pyrimidine C, U
(2) Functional group amino A, C
keto G, U
(3) Hydrogen bonding stronger C, G
weaker A, U

See Local Features of Individual Nucleotides for further explanation.

Collective Features of the Constituent Nucleotides

There are some methods to reflect the coupling of a biological sequence or the collective effect of its constituent elements, such as the conditional probability approach,94 degenerate Kmer strategy,40 and g-gap dipeptide mode.41 In this study, we would like to use a different approach; i.e., consider the occurrence frequency of a nucleotide not only for its local site but also for its distribution along the sequence of an RNA sample, as defined by the following equation

Di=1Lij=1f(Nj), (Equation 8)

where Di is the density of the nucleotide Ni at the site i of a RNA sequence, Li the length of the sliding substring concerned, denotes each of the site locations counted in the substring, and

f(Nj)={1,ifNj=thenucleotideconcerned0,otherwise. (Equation 9)

For instance, suppose a RNA sequence is “CACGUC.” The density of “A” at the sequence position 1, 2, 3, 4, 5, or 6 is 0=0/1, 0.5=1/2, 0.331/3, 0.25=1/4, 0.20=1/5, or 0.16=1/6, respectively; that of “C” is 1=1/1, 0=0/2, 0.662/3, 0.5=2/4, 0.4=2/5 or 0.5=3/6, respectively; and so forth.

By combing (Equation 6), (Equation 9), the i-th nucleotide of Equation 1 can be uniquely defined by a set of four variables; i.e.,

Ni=(xi,yi,zi,Di). (Equation 10)

For example, the RNA sequence “CACGUC” can be expressed by the following five sets of digital numbers: (0, 1, 0, 1), (1, 1, 1, 0.5), (0, 1, 0, 0.66), (1, 0, 0, 0.25), (0, 0, 1, 0.2), and (0, 1, 0, 0.5). Submitting these numbers into Equation 5, we have

R(CACGUC)=[01011110.50100.661000.250010.20100.5]T, (Equation 11)

meaning that the 6-nt nucleotide example can be defined by a 6×4=24 -D (dimensional) PseKNC vector.

Accordingly, all the samples in the current benchmark datasets (Supplemental Materials and Methods) can be formulated with a 41×4=164 -D vector.

Operation Engine

The prediction was operated by SVM (support vector machine), which has been widely used in various areas of bioinformatics and computational biology.20, 40, 42, 59, 67, 77, 78, 79, 80, 81, 95, 96, 97, 98, 99, 100, 101, 102, 103 Its basic idea has been elaborated in the aforementioned the papers, and there is no need to repeat it here.

In the current study, the LibSVM package 3.18 was used to implement SVM, which can be downloaded for free from http://www.csie.ntu.edu.tw/∼cjlin/libsvm/. The SVM algorithm contains two uncertain quantities: one is the regularization parameter C and the other is the kernel width parameter γ. They were optimized via an optimization procedure using the grid search approach as described by

{25C215withstepΔC=2215γ25withstepΔγ=21, (Equation 12)

where ΔC and Δγ represent the step gaps for C and γ, respectively.

For those readers who are interested in knowing more about SVM, see Chou and Cai104 and Cai et al.105 or a monograph106 where a brief introduction or detailed description were given, respectively.

The platform predictor obtained via the aforementioned procedures is called “iRNA-PseColl,” where “i” stands for “identify,” “Pse” for “pseudo component approach,” and “Coll” for “collective effects of nucleotides.”

Quality Control or Examination

Quality control is a very important process in industries; it is even more important for a predictor. To deal with this problem, we need to address the following two issues: (1) what standard or metrics should we adopt to measure the predictor’s quality, and (2) what test process or method we should take to calculate the metrics. Below, we address the two problems.

A Set of Four Intuitive Metrics

The current prediction is belonging to the category called “binary classification” widely existing in genome analyses. To measure the prediction quality of this kind, a set of four metrics are usually used in literature107: (1) sensitivity or Sn, (2) specificity or Sp, (3) overall accuracy or Acc, and (4) Mathew’s correlation coefficient or MCC. Unfortunately, their formulations were directly taken from mathematical literature and difficult to be understood by most biological scientists. Fortunately, using the symbols introduced by Chou108 in studying signal peptides, Xu et al.109 and Chen et al.110 have derived a new set of metrics that is equivalent to the old one but much more intuitive and easier to be understood by most biologists, as given below

To address this, we need to consider two issues: one is what metrics should be used to reflect the predictor’s success rates; the other is what test method should be adopted to derive the metrics rates.

To quantitatively evaluate the quality of a binary classification predictor, four metrics are generally needed.107 They are: (1) Acc for the predictor’s overall accuracy; (2) MCC for its stability; (3) Sn for its sensitivity; and (4) Sp for its specificity. Unfortunately, the conventional formulations for the four metrics are not quite intuitive, and most biologists have difficulty understanding them, particularly the stability of MCC. Fortunately, as elaborated in Yu et al.109 and Chen et al.,110 by using the Chou’s111 symbols and derivation in studying signal peptides, the conventional metrics can be converted into a set of four intuitive equations, as formulated below:

{Sn=1N+N+0Sn1Sp=1N+N0Sp1Acc=Λ=1N++N+N++N0Acc1MCC=1(N+N++N+N)(1+N+N+N+)(1+N+N+N)1MCC1, (Equation 13)

where N+ represents the total number of positive samples investigated, N+ is the number of positive samples incorrectly predicted to be the negative, N is the total number of negative samples investigated, and N+ is the number of the negative samples incorrectly predicted to be the positive.

With the metrics of Equation 13, the meanings of Sn, Sp, Acc, and MCC have become crystal clear as discussed and used in a series of follow-up studies for many different areas.20, 21, 38, 40, 42, 44, 45, 46, 47, 48, 49, 56, 57, 61, 67, 80, 82, 84, 97, 99, 112, 113, 114, 115 It is instructive to point out that more multi-label sequence samples have been emerging in system biology and medicine.49, 116, 117, 118, 119 To deal with this kind of multi-label system, a much more sophisticated set of metrics is needed as elaborated in Chou.120

Jackknife Validation

Three different cross-validation methods are often adopted in literature. These methods include24: (1) an independent dataset test, (2) a subsampling (or K-fold cross-validation) test, and (3) the jackknife test. However, as elucidated in Chou55 in the above three choices, the jackknife test has been demonstrated to be the least arbitrary that can always yield a unique outcome for a given benchmark dataset. Therefore, the jackknife test has been widely recognized and increasingly adopted by researchers to analyze the quality of various predictors.83, 103, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131 In view of this, we also used the jackknife test to examine the quality of the current prediction method. The jackknife test can exclude the “memory” effect because both the training dataset and testing dataset in a jackknife system are actually open, and each sample will be, in turn, moved between the two. The arbitrariness problem intrinsic to the independent dataset and subsampling tests55 no longer exists, because the outcome derived via the jackknife test for a predictor is always the same on a given benchmark dataset.

Author Contributions

W.C., H.L., and K.-C.C. conceived and designed the study. P.F. and H.D. conducted the experiments. P.F., H.D., and W.C. implemented the algorithms. H.Y. established the web server. W.C., H.L., and K.-C.C. performed the analysis and wrote the paper. All authors read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Acknowledgments

The authors wish to thank the two anonymous reviewers for their constructive comments, which were very helpful for strengthening the presentation of this paper. This work was supported by the Program for the Top Young Innovative Talents of Higher Learning Institutions of Hebei Province (BJ2014028), the Outstanding Youth Foundation of North China University of Science and Technology (JP201502), the China Postdoctoral Science Foundation (2015M582533), and the Fundamental Research Funds for the Central Universities, China (ZYGX2015J144 and ZYGX2015Z006).

Footnotes

Supplemental Information includes Supplemental Materials and Methods and can be found with this article online at http://dx.doi.org/10.1016/j.omtn.2017.03.006.

Contributor Information

Wei Chen, Email: chenweiimu@gmail.com.

Hao Lin, Email: hlin@uestc.edu.cn.

Kuo-Chen Chou, Email: kcchou@gordonlifescience.org.

Supplemental Information

Document S1. Supplemental Materials and Methods
mmc1.pdf (1.7MB, pdf)
Document S2. Article plus Supplemental Information
mmc2.pdf (2.5MB, pdf)

References

  • 1.Davis F.F., Allen F.W. Ribonucleic acids from yeast which contain a fifth nucleotide. J. Biol. Chem. 1957;227:907–915. [PubMed] [Google Scholar]
  • 2.Machnicka M.A., Milanowska K., Osman Oglou O., Purta E., Kurkowska M., Olchowik A., Januszewski W., Kalinowski S., Dunin-Horkawicz S., Rother K.M. MODOMICS: a database of RNA modification pathways--2013 update. Nucleic Acids Res. 2013;41:D262–D267. doi: 10.1093/nar/gks1007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Meyer K.D., Jaffrey S.R. The dynamic epitranscriptome: N6-methyladenosine and gene expression control. Nat. Rev. Mol. Cell Biol. 2014;15:313–326. doi: 10.1038/nrm3785. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Nilsen T.W. Molecular biology. Internal mRNA methylation finally finds functions. Science. 2014;343:1207–1208. doi: 10.1126/science.1249340. [DOI] [PubMed] [Google Scholar]
  • 5.Peifer C., Sharma S., Watzinger P., Lamberth S., Kötter P., Entian K.D. Yeast Rrp8p, a novel methyltransferase responsible for m1A 645 base modification of 25S rRNA. Nucleic Acids Res. 2013;41:1151–1163. doi: 10.1093/nar/gks1102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Ballesta J.P., Cundliffe E. Site-specific methylation of 16S rRNA caused by pct, a pactamycin resistance determinant from the producing organism, Streptomyces pactum. J. Bacteriol. 1991;173:7213–7218. [Google Scholar]
  • 7.Chen T., Hao Y.J., Zhang Y., Li M.M., Wang M., Han W., Wu Y., Lv Y., Hao J., Wang L. m(6)A RNA methylation is regulated by microRNAs and promotes reprogramming to pluripotency. Cell Stem Cell. 2015;16:289–301. doi: 10.1016/j.stem.2015.01.016. [DOI] [PubMed] [Google Scholar]
  • 8.Hoernes T.P., Hüttenhofer A., Erlacher M.D. mRNA modifications: dynamic regulators of gene expression? RNA Biol. 2016;13:760–765. doi: 10.1080/15476286.2016.1203504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Dominissini D., Nachtergaele S., Moshitch-Moshkovitz S., Peer E., Kol N., Ben-Haim M.S., Dai Q., Di Segni A., Salmon-Divon M., Clark W.C. The dynamic N(1)-methyladenosine methylome in eukaryotic messenger RNA. Nature. 2016;530:441–446. doi: 10.1038/nature16998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Li X., Xiong X., Wang K., Wang L., Shu X., Ma S., Yi C. Transcriptome-wide mapping reveals reversible and dynamic N(1)-methyladenosine methylome. Nat. Chem. Biol. 2016;12:311–316. doi: 10.1038/nchembio.2040. [DOI] [PubMed] [Google Scholar]
  • 11.Cai L., Yuan W., Zhang Z., He L., Chou K.C. In-depth comparison of somatic point mutation callers based on different tumor next-generation sequencing depth data. Sci. Rep. 2016;6:36540. doi: 10.1038/srep36540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Khoddami V., Cairns B.R. Transcriptome-wide target profiling of RNA cytosine methyltransferases using the mechanism-based enrichment procedure Aza-IP. Nat. Protoc. 2014;9:337–361. doi: 10.1038/nprot.2014.014. [DOI] [PubMed] [Google Scholar]
  • 13.Dominissini D., Moshitch-Moshkovitz S., Schwartz S., Salmon-Divon M., Ungar L., Osenberg S., Cesarkas K., Jacob-Hirsch J., Amariglio N., Kupiec M. Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq. Nature. 2012;485:201–206. doi: 10.1038/nature11112. [DOI] [PubMed] [Google Scholar]
  • 14.Schwartz S., Agarwala S.D., Mumbach M.R., Jovanovic M., Mertins P., Shishkin A., Tabach Y., Mikkelsen T.S., Satija R., Ruvkun G. High-resolution mapping reveals a conserved, widespread, dynamic mRNA methylation program in yeast meiosis. Cell. 2013;155:1409–1421. doi: 10.1016/j.cell.2013.10.047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Squires J.E., Patel H.R., Nousch M., Sibbritt T., Humphreys D.T., Parker B.J., Suter C.M., Preiss T. Widespread occurrence of 5-methylcytosine in human coding and non-coding RNA. Nucleic Acids Res. 2012;40:5023–5033. doi: 10.1093/nar/gks144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Chen W., Feng P., Tang H., Ding H., Lin H. RAMPred: identifying the N(1)-methyladenosine sites in eukaryotic transcriptomes. Sci. Rep. 2016;6:31080. doi: 10.1038/srep31080. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Chen W., Tran H., Liang Z., Lin H., Zhang L. Identification and analysis of the N(6)-methyladenosine in the Saccharomyces cerevisiae transcriptome. Sci. Rep. 2015;5:13859. doi: 10.1038/srep13859. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Chen W., Feng P., Tang H., Ding H., Lin H. Identifying 2′-O-methylationation sites by integrating nucleotide chemical properties and nucleotide compositions. Genomics. 2016;107:255–258. doi: 10.1016/j.ygeno.2016.05.003. [DOI] [PubMed] [Google Scholar]
  • 19.Feng P., Ding H., Chen W., Lin H. Identifying RNA 5-methylcytosine sites via pseudo nucleotide compositions. Mol. Biosyst. 2016;12:3307–3311. doi: 10.1039/c6mb00471g. [DOI] [PubMed] [Google Scholar]
  • 20.Chen W., Feng P., Ding H., Lin H., Chou K.C. iRNA-methyl: identifying N6-methyladenosine sites using pseudo nucleotide composition. Anal. Biochem. 2015;490:26–33. doi: 10.1016/j.ab.2015.08.021. [DOI] [PubMed] [Google Scholar]
  • 21.Liu Z., Xiao X., Yu D.J., Jia J., Qiu W.R., Chou K.C. pRNAm-PC: Predicting N(6)-methyladenosine sites in RNA sequences via physical-chemical properties. Anal. Biochem. 2016;497:60–67. doi: 10.1016/j.ab.2015.12.017. [DOI] [PubMed] [Google Scholar]
  • 22.Chen W., Lei T.Y., Jin D.C., Lin H., Chou K.C. PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. Anal. Biochem. 2014;456:53–60. doi: 10.1016/j.ab.2014.04.001. [DOI] [PubMed] [Google Scholar]
  • 23.Chen W., Lin H., Chou K.C. Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. Mol. Biosyst. 2015;11:2620–2634. doi: 10.1039/c5mb00155b. [DOI] [PubMed] [Google Scholar]
  • 24.Chou K.C., Zhang C.T. Prediction of protein structural classes. Crit. Rev. Biochem. Mol. Biol. 1995;30:275–349. doi: 10.3109/10409239509083488. [DOI] [PubMed] [Google Scholar]
  • 25.Chou K.C., Jiang S.P., Liu W.M., Fee C.H. Graph theory of enzyme kinetics: 1. Steady-state reaction system. Sci. Sin. 1979;22:341–358. [Google Scholar]
  • 26.Chou K.C., Forsén S. Graphical rules for enzyme-catalysed rate laws. Biochem. J. 1980;187:829–835. doi: 10.1042/bj1870829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Zhou G.P., Deng M.H. An extension of Chou’s graphic rules for deriving enzyme kinetic equations to systems involving parallel reaction pathways. Biochem. J. 1984;222:169–176. doi: 10.1042/bj2220169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Chou K.C. Graphic rules in steady and non-steady state enzyme kinetics. J. Biol. Chem. 1989;264:12074–12079. [PubMed] [Google Scholar]
  • 29.Althaus I.W., Gonzales A.J., Chou J.J., Romero D.L., Deibel M.R., Chou K.C., Kezdy F.J., Resnick L., Busso M.E., So A.G. The quinoline U-78036 is a potent inhibitor of HIV-1 reverse transcriptase. J. Biol. Chem. 1993;268:14875–14880. [PubMed] [Google Scholar]
  • 30.Althaus I.W., Chou J.J., Gonzales A.J., Deibel M.R., Chou K.C., Kezdy F.J., Romero D.L., Palmer J.R., Thomas R.C., Aristoff P.A. Kinetic studies with the non-nucleoside HIV-1 reverse transcriptase inhibitor U-88204E. Biochemistry. 1993;32:6548–6554. doi: 10.1021/bi00077a008. [DOI] [PubMed] [Google Scholar]
  • 31.Wu Z.C., Xiao X., Chou K.C. 2D-MH: A web-server for generating graphic representation of protein sequences based on the physicochemical properties of their constituent amino acids. J. Theor. Biol. 2010;267:29–34. doi: 10.1016/j.jtbi.2010.08.007. [DOI] [PubMed] [Google Scholar]
  • 32.Chou K.C., Lin W.Z., Xiao X. Wenxiang: a web-server for drawing wenxiang diagrams. Nat. Sci. 2011;3:862–865. [Google Scholar]
  • 33.Zhou G.P. The disposition of the LZCC protein residues in wenxiang diagram provides new insights into the protein-protein interaction mechanism. J. Theor. Biol. 2011;284:142–148. doi: 10.1016/j.jtbi.2011.06.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Zhou G.P., Chen D., Liao S., Huang R.B. Recent progresses in studying helix-helix interactions in proteins by incorporating the Wenxiang diagram into the NMR spectroscopy. Curr. Top. Med. Chem. 2016;16:581–590. doi: 10.2174/1568026615666150819104617. [DOI] [PubMed] [Google Scholar]
  • 35.Fawcett T. An introduction to ROC analysis. Pattern Recognit. Lett. 2005;27:861–874. [Google Scholar]
  • 36.Davis, J., and Goadrich, M. (2006). The relationship between precision-recall and ROC curves. Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240.
  • 37.Zhang C.J., Tang H., Li W.C., Lin H., Chen W., Chou K.C. iOri-human: identify human origin of replication by incorporating dinucleotide physicochemical properties into pseudo nucleotide composition. Oncotarget. 2016;7:69783–69793. doi: 10.18632/oncotarget.11975. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Jia J., Liu Z., Xiao X., Liu B., Chou K.C. iPPI-Esml: an ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC. J. Theor. Biol. 2015;377:47–56. doi: 10.1016/j.jtbi.2015.04.011. [DOI] [PubMed] [Google Scholar]
  • 39.Xiao X., Ye H.X., Liu Z., Jia J.H., Chou K.C. iROS-gPseKNC: predicting replication origin sites in DNA by incorporating dinucleotide position-specific propensity into general pseudo nucleotide composition. Oncotarget. 2016;7:34180–34189. doi: 10.18632/oncotarget.9057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Liu B., Fang L., Wang S., Wang X., Li H., Chou K.C. Identification of microRNA precursor with the degenerate K-tuple or Kmer strategy. J. Theor. Biol. 2015;385:153–159. doi: 10.1016/j.jtbi.2015.08.025. [DOI] [PubMed] [Google Scholar]
  • 41.Chen W., Ding H., Feng P., Lin H., Chou K.C. iACP: a sequence-based tool for identifying anticancer peptides. Oncotarget. 2016;7:16895–16909. doi: 10.18632/oncotarget.7815. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Liu Z., Xiao X., Qiu W.R., Chou K.C. iDNA-methyl: identifying DNA methylation sites via pseudo trinucleotide composition. Anal. Biochem. 2015;474:69–77. doi: 10.1016/j.ab.2014.12.009. [DOI] [PubMed] [Google Scholar]
  • 43.Chen W., Tang H., Ye J., Lin H., Chou K.C. iRNA-PseU: identifying RNA pseudouridine sites. Mol. Ther. Nucleic Acids. 2016;5:e332. doi: 10.1038/mtna.2016.37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Jia J., Liu Z., Xiao X., Liu B., Chou K.C. iSuc-PseOpt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. Anal. Biochem. 2016;497:48–56. doi: 10.1016/j.ab.2015.12.009. [DOI] [PubMed] [Google Scholar]
  • 45.Qiu W.R., Sun B.Q., Xiao X., Xu Z.C., Chou K.C. iHyd-PseCp: identify hydroxyproline and hydroxylysine in proteins by incorporating sequence-coupled effects into general PseAAC. Oncotarget. 2016;7:44310–44321. doi: 10.18632/oncotarget.10027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Qiu W.R., Xiao X., Xu Z.C., Chou K.C. iPhos-PseEn: identifying phosphorylation sites in proteins by fusing different pseudo components into an ensemble classifier. Oncotarget. 2016;7:51270–51283. doi: 10.18632/oncotarget.9987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Jia J., Liu Z., Xiao X., Liu B., Chou K.C. pSuc-Lys: predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. J. Theor. Biol. 2016;394:223–230. doi: 10.1016/j.jtbi.2016.01.020. [DOI] [PubMed] [Google Scholar]
  • 48.Jia J., Zhang L., Liu Z., Xiao X., Chou K.C. pSumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC. Bioinformatics. 2016;32:3133–3141. doi: 10.1093/bioinformatics/btw387. [DOI] [PubMed] [Google Scholar]
  • 49.Qiu W.R., Sun B.Q., Xiao X., Xu Z.C., Chou K.C. iPTM-mLys: identifying multiple lysine PTM sites and their different types. Bioinformatics. 2016;32:3116–3123. doi: 10.1093/bioinformatics/btw380. [DOI] [PubMed] [Google Scholar]
  • 50.Meher P.K., Sahu T.K., Saini V., Rao A.R. Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou’s general PseAAC. Sci. Rep. 2017;7:42362. doi: 10.1038/srep42362. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Chen W., Feng P., Yang H., Ding H., Lin H., Chou K.C. iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences. Oncotarget. 2017;8:4208–4217. doi: 10.18632/oncotarget.13758. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Liu B., Wang S., Long R., Chou K.C. iRSpot-EL: identify recombination spots with an ensemble learning approach. Bioinformatics. 2017;33:35–41. doi: 10.1093/bioinformatics/btw539. [DOI] [PubMed] [Google Scholar]
  • 53.Liu B., Wu H., Zhang D., Wang X., Chou K.C. Pse-Analysis: a python package for DNA/RNA and protein/ peptide sequence analysis based on pseudo components and kernel methods. Oncotarget. 2017;8:4208–4217. doi: 10.18632/oncotarget.14524. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Chou K.C. Impacts of bioinformatics to medicinal chemistry. Med. Chem. 2015;11:218–234. doi: 10.2174/1573406411666141229162834. [DOI] [PubMed] [Google Scholar]
  • 55.Chou K.C. Some remarks on protein attribute prediction and pseudo amino acid composition. J. Theor. Biol. 2011;273:236–247. doi: 10.1016/j.jtbi.2010.12.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Jia J., Liu Z., Xiao X., Liu B., Chou K.C. iPPBS-Opt: a sequence-based ensemble classifier for identifying protein-protein binding sites by optimizing imbalanced training datasets. Molecules. 2016;21:E95. doi: 10.3390/molecules21010095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Jia J., Liu Z., Xiao X., Liu B., Chou K.C. iCar-PseCp: identify carbonylation sites in proteins by Monte Carlo sampling and incorporating sequence coupled effects into general PseAAC. Oncotarget. 2016;7:34558–34570. doi: 10.18632/oncotarget.9148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Liu B., Fang L., Liu F., Wang X., Chou K.C. iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach. J. Biomol. Struct. Dyn. 2016;34:223–235. doi: 10.1080/07391102.2015.1014422. [DOI] [PubMed] [Google Scholar]
  • 59.Liu B., Fang L., Long R., Lan X., Chou K.C. iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition. Bioinformatics. 2016;32:362–369. doi: 10.1093/bioinformatics/btv604. [DOI] [PubMed] [Google Scholar]
  • 60.Liu B., Long R., Chou K.C. iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework. Bioinformatics. 2016;32:2411–2418. doi: 10.1093/bioinformatics/btw186. [DOI] [PubMed] [Google Scholar]
  • 61.Qiu W., Sun B.Q., Xiao X., Chou K.C. iPhos-PseEvo: identifying human phosphorylated proteins by incorporating evolutionary information into general PseAAC via grey system theory. Mol. Inform. 2016 doi: 10.1002/minf.201600010. [DOI] [PubMed] [Google Scholar]
  • 62.Cheng X., Zhao S.G., Xiao X., Chou K.C. iATC-mISF: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals. Bioinformatics. 2016;33:341–346. doi: 10.1093/bioinformatics/btw644. [DOI] [PubMed] [Google Scholar]
  • 63.Khan M., Hayat M., Khan S.A., Iqbal N. Unb-DPC: Identify mycobacterial membrane protein types by incorporating un-biased dipeptide composition into Chou’s general PseAAC. J. Theor. Biol. 2017;415:13–19. doi: 10.1016/j.jtbi.2016.12.004. [DOI] [PubMed] [Google Scholar]
  • 64.Chou K.C., Shen H.B. Recent progress in protein subcellular location prediction. Anal. Biochem. 2007;370:1–16. doi: 10.1016/j.ab.2007.07.006. [DOI] [PubMed] [Google Scholar]
  • 65.Chou K.C. Prediction of signal peptides using scaled window. Peptides. 2001;22:1973–1979. doi: 10.1016/s0196-9781(01)00540-x. [DOI] [PubMed] [Google Scholar]
  • 66.Chen W., Tang H., Lin H. MethyRNA: a web server for identification of N6-methyladenosine sites. J. Biomol. Struct. Dyn. 2016;35:683–687. doi: 10.1080/07391102.2016.1157761. [DOI] [PubMed] [Google Scholar]
  • 67.Xiao X., Min J.L., Lin W.Z., Liu Z., Cheng X., Chou K.C. iDrug-Target: predicting the interactions between drug compounds and target proteins in cellular networking via benchmark dataset optimization approach. J. Biomol. Struct. Dyn. 2015;33:2221–2233. doi: 10.1080/07391102.2014.998710. [DOI] [PubMed] [Google Scholar]
  • 68.Chou K.C. Prediction of protein cellular attributes using pseudo amino acid composition. Proteins. 2001;43:246–255. doi: 10.1002/prot.1035. [DOI] [PubMed] [Google Scholar]
  • 69.Chou K.C. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics. 2005;21:10–19. doi: 10.1093/bioinformatics/bth466. [DOI] [PubMed] [Google Scholar]
  • 70.Du P., Wang X., Xu C., Gao Y. PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou’s pseudo-amino acid compositions. Anal. Biochem. 2012;425:117–119. doi: 10.1016/j.ab.2012.03.015. [DOI] [PubMed] [Google Scholar]
  • 71.Cao D.S., Xu Q.S., Liang Y.Z. propy: a tool to generate various modes of Chou’s PseAAC. Bioinformatics. 2013;29:960–962. doi: 10.1093/bioinformatics/btt072. [DOI] [PubMed] [Google Scholar]
  • 72.Du P., Gu S., Jiao Y. PseAAC-General: fast building various modes of general form of Chou’s pseudo-amino acid composition for large-scale protein datasets. Int. J. Mol. Sci. 2014;15:3495–3506. doi: 10.3390/ijms15033495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Chou K.C. Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Curr. Proteomics. 2009;6:262–274. [Google Scholar]
  • 74.Chen W., Zhang X., Brooker J., Lin H., Zhang L., Chou K.C. PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions. Bioinformatics. 2015;31:119–120. doi: 10.1093/bioinformatics/btu602. [DOI] [PubMed] [Google Scholar]
  • 75.Liu B., Liu F., Fang L., Wang X., Chou K.C. repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects. Bioinformatics. 2015;31:1307–1309. doi: 10.1093/bioinformatics/btu820. [DOI] [PubMed] [Google Scholar]
  • 76.Liu B., Liu F., Fang L., Wang X., Chou K.C. repRNA: a web server for generating various feature vectors of RNA sequences. Mol. Genet. Genomics. 2016;291:473–481. doi: 10.1007/s00438-015-1078-7. [DOI] [PubMed] [Google Scholar]
  • 77.Chen W., Feng P.M., Deng E.Z., Lin H., Chou K.C. iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. Anal. Biochem. 2014;462:76–83. doi: 10.1016/j.ab.2014.06.022. [DOI] [PubMed] [Google Scholar]
  • 78.Chen W., Feng P.M., Lin H., Chou K.C. iSS-PseDNC: identifying splicing sites using pseudo dinucleotide composition. BioMed Res. Int. 2014;2014:623149. doi: 10.1155/2014/623149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Guo S.H., Deng E.Z., Xu L.Q., Ding H., Lin H., Chen W., Chou K.C. iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics. 2014;30:1522–1529. doi: 10.1093/bioinformatics/btu083. [DOI] [PubMed] [Google Scholar]
  • 80.Lin H., Deng E.Z., Ding H., Chen W., Chou K.C. iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Res. 2014;42:12961–12972. doi: 10.1093/nar/gku1019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Qiu W.R., Xiao X., Chou K.C. iRSpot-TNCPseAAC: identify recombination spots with trinucleotide composition and pseudo amino acid components. Int. J. Mol. Sci. 2014;15:1746–1766. doi: 10.3390/ijms15021746. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Liu B., Fang L., Liu F., Wang X., Chen J., Chou K.C. Identification of real microRNA precursors with a pseudo structure status composition approach. PLoS ONE. 2015;10:e0121501. doi: 10.1371/journal.pone.0121501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Kabir M., Hayat M. iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou’s PseAAC to formulate DNA samples. Mol. Genet. Genomics. 2016;291:285–296. doi: 10.1007/s00438-015-1108-5. [DOI] [PubMed] [Google Scholar]
  • 84.Chen W., Feng P., Ding H., Lin H., Chou K.C. Using deformation energy to analyze nucleosome positioning in genomes. Genomics. 2016;107:69–75. doi: 10.1016/j.ygeno.2015.12.005. [DOI] [PubMed] [Google Scholar]
  • 85.Tahir M., Hayat M. iNuc-STNC: a sequence-based predictor for identification of nucleosome positioning in genomes by extending the concept of SAAC and Chou’s PseAAC. Mol. Biosyst. 2016;12:2587–2593. doi: 10.1039/c6mb00221h. [DOI] [PubMed] [Google Scholar]
  • 86.Liu B., Liu F., Wang X., Chen J., Fang L., Chou K.C. Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res. 2015;43(W1):W65–W71. doi: 10.1093/nar/gkv458. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Chou K.C. Low-frequency vibrations of DNA molecules. Biochem. J. 1984;221:27–31. doi: 10.1042/bj2210027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Chou K.C., Maggiora G.M., Mao B. Quasi-continuum models of twist-like and accordion-like low-frequency motions in DNA. Biophys. J. 1989;56:295–305. doi: 10.1016/S0006-3495(89)82676-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Chou K.C., Chen N.Y., Forsen S. The biological functions of low-frequency phonons: 2. Cooperative effects. Chem. Scr. 1981;18:126–132. [Google Scholar]
  • 90.Chou K.C., Mao B. Collective motion in DNA and its role in drug intercalation. Biopolymers. 1988;27:1795–1815. doi: 10.1002/bip.360271109. [DOI] [PubMed] [Google Scholar]
  • 91.Chou K.C. Low-frequency collective motion in biomacromolecules and its biological functions. Biophys. Chem. 1988;30:3–48. doi: 10.1016/0301-4622(88)85002-6. [DOI] [PubMed] [Google Scholar]
  • 92.Chou K.C., Zhang C.T. Diagrammatization of codon usage in 339 human immunodeficiency virus proteins and its biological implication. AIDS Res. Hum. Retroviruses. 1992;8:1967–1976. doi: 10.1089/aid.1992.8.1967. [DOI] [PubMed] [Google Scholar]
  • 93.Zhang C.T., Chou K.C. A graphic approach to analyzing codon usage in 1562 Escherichia coli protein coding sequences. J. Mol. Biol. 1994;238:1–8. doi: 10.1006/jmbi.1994.1263. [DOI] [PubMed] [Google Scholar]
  • 94.Chou K.C. A sequence-coupled vector-projection model for predicting the specificity of GalNAc-transferase. Protein Sci. 1995;4:1365–1383. doi: 10.1002/pro.5560040712. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Feng P.M., Chen W., Lin H., Chou K.C. iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal. Biochem. 2013;442:118–125. doi: 10.1016/j.ab.2013.05.024. [DOI] [PubMed] [Google Scholar]
  • 96.Liu B., Zhang D., Xu R., Xu J., Wang X., Chen Q., Dong Q., Chou K.C. Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics. 2014;30:472–479. doi: 10.1093/bioinformatics/btt709. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Ding H., Deng E.Z., Yuan L.F., Liu L., Lin H., Chen W., Chou K.C. iCTX-type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels. BioMed Res. Int. 2014;2014:286419. doi: 10.1155/2014/286419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Fan Y.N., Xiao X., Min J.L., Chou K.C. iNR-Drug: predicting the interaction of drugs with nuclear receptors in cellular networking. Int. J. Mol. Sci. 2014;15:4915–4937. doi: 10.3390/ijms15034915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Xu Y., Wen X., Shao X.J., Deng N.Y., Chou K.C. iHyd-PseAAC: predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition. Int. J. Mol. Sci. 2014;15:7594–7610. doi: 10.3390/ijms15057594. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Liu B., Xu J., Lan X., Xu R., Zhou J., Wang X., Chou K.C. iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS ONE. 2014;9:e106691. doi: 10.1371/journal.pone.0106691. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Qiu W.R., Xiao X., Lin W.Z., Chou K.C. iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model. J. Biomol. Struct. Dyn. 2015;33:1731–1742. doi: 10.1080/07391102.2014.968875. [DOI] [PubMed] [Google Scholar]
  • 102.Xu R., Zhou J., Liu B., He Y., Zou Q., Wang X., Chou K.C. Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach. J. Biomol. Struct. Dyn. 2015;33:1720–1730. doi: 10.1080/07391102.2014.968624. [DOI] [PubMed] [Google Scholar]
  • 103.Chen J., Long R., Wang X.L., Liu B., Chou K.C. dRHP-PseRA: detecting remote homology proteins using profile-based pseudo protein sequence and rank aggregation. Sci. Rep. 2016;6:32333. doi: 10.1038/srep32333. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Chou K.C., Cai Y.D. Using functional domain composition and support vector machines for prediction of protein subcellular location. J. Biol. Chem. 2002;277:45765–45769. doi: 10.1074/jbc.M204161200. [DOI] [PubMed] [Google Scholar]
  • 105.Cai Y.D., Zhou G.P., Chou K.C. Support vector machines for predicting membrane protein types by using functional domain composition. Biophys. J. 2003;84:3257–3263. doi: 10.1016/S0006-3495(03)70050-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Cristianini N., Shawe-Taylor J. Chapter 3. Cambridge University Press; 2000. (An Introduction to Support Vector Machines and Other Kernel-based Learning Methods). [Google Scholar]
  • 107.Chen J., Liu H., Yang J., Chou K.C. Prediction of linear B-cell epitopes using amino acid pair antigenicity scale. Amino Acids. 2007;33:423–428. doi: 10.1007/s00726-006-0485-9. [DOI] [PubMed] [Google Scholar]
  • 108.Chou K.C. Using subsite coupling to predict signal peptides. Protein Eng. 2001;14:75–79. doi: 10.1093/protein/14.2.75. [DOI] [PubMed] [Google Scholar]
  • 109.Xu Y., Ding J., Wu L.Y., Chou K.C. iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition. PLoS ONE. 2013;8:e55844. doi: 10.1371/journal.pone.0055844. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Chen W., Feng P.M., Lin H., Chou K.C. iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res. 2013;41:e68. doi: 10.1093/nar/gks1450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Chou K.C. Prediction of protein signal sequences and their cleavage sites. Proteins. 2001;42:136–139. doi: 10.1002/1097-0134(20010101)42:1<136::aid-prot130>3.0.co;2-f. [DOI] [PubMed] [Google Scholar]
  • 112.Xu Y., Shao X.J., Wu L.Y., Deng N.Y., Chou K.C. iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins. PeerJ. 2013;1:e171. doi: 10.7717/peerj.171. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Jia J., Liu Z., Xiao X., Liu B., Chou K.C. Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition. J. Biomol. Struct. Dyn. 2016;34:1946–1961. doi: 10.1080/07391102.2015.1095116. [DOI] [PubMed] [Google Scholar]
  • 114.Xu Y., Chou K.C. Recent progress in predicting posttranslational modification sites in proteins. Curr. Top. Med. Chem. 2016;16:591–603. doi: 10.2174/1568026615666150819110421. [DOI] [PubMed] [Google Scholar]
  • 115.Xu Y., Wen X., Wen L.S., Wu L.Y., Deng N.Y., Chou K.C. iNitro-Tyr: prediction of nitrotyrosine sites in proteins with general pseudo amino acid composition. PLoS ONE. 2014;9:e105018. doi: 10.1371/journal.pone.0105018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Xiao X., Wu Z.C., Chou K.C. iLoc-Virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. J. Theor. Biol. 2011;284:42–51. doi: 10.1016/j.jtbi.2011.06.005. [DOI] [PubMed] [Google Scholar]
  • 117.Chou K.C., Wu Z.C., Xiao X. iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. Mol. Biosyst. 2012;8:629–641. doi: 10.1039/c1mb05420a. [DOI] [PubMed] [Google Scholar]
  • 118.Xiao X., Wang P., Lin W.Z., Jia J.H., Chou K.C. iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. Anal. Biochem. 2013;436:168–177. doi: 10.1016/j.ab.2013.01.019. [DOI] [PubMed] [Google Scholar]
  • 119.Lin W.Z., Fang J.A., Xiao X., Chou K.C. iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins. Mol. Biosyst. 2013;9:634–644. doi: 10.1039/c3mb25466f. [DOI] [PubMed] [Google Scholar]
  • 120.Chou K.C. Some remarks on predicting multi-label attributes in molecular biosystems. Mol. Biosyst. 2013;9:1092–1100. doi: 10.1039/c3mb25555g. [DOI] [PubMed] [Google Scholar]
  • 121.Zhou G.P., Assa-Munt N. Some insights into protein structural class prediction. Proteins. 2001;44:57–59. doi: 10.1002/prot.1071. [DOI] [PubMed] [Google Scholar]
  • 122.Zhou G.P., Doctor K. Subcellular location prediction of apoptosis proteins. Proteins. 2003;50:44–48. doi: 10.1002/prot.10251. [DOI] [PubMed] [Google Scholar]
  • 123.Chou K.C., Cai Y.D. Prediction of membrane protein types by incorporating amphipathic effects. J. Chem. Inf. Model. 2005;45:407–413. doi: 10.1021/ci049686v. [DOI] [PubMed] [Google Scholar]
  • 124.Mondal S., Pai P.P. Chou’s pseudo amino acid composition improves sequence-based antifreeze protein prediction. J. Theor. Biol. 2014;356:30–35. doi: 10.1016/j.jtbi.2014.04.006. [DOI] [PubMed] [Google Scholar]
  • 125.Dehzangi A., Heffernan R., Sharma A., Lyons J., Paliwal K., Sattar A. Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou's general PseAAC. J. Theor. Biol. 2015;364:284–294. doi: 10.1016/j.jtbi.2014.09.029. [DOI] [PubMed] [Google Scholar]
  • 126.Khan Z.U., Hayat M., Khan M.A. Discrimination of acidic and alkaline enzyme using Chou’s pseudo amino acid composition in conjunction with probabilistic neural network model. J. Theor. Biol. 2015;365:197–203. doi: 10.1016/j.jtbi.2014.10.014. [DOI] [PubMed] [Google Scholar]
  • 127.Kumar R., Srivastava A., Kumari B., Kumar M. Prediction of β-lactamase and its class by Chou’s pseudo-amino acid composition and support vector machine. J. Theor. Biol. 2015;365:96–103. doi: 10.1016/j.jtbi.2014.10.008. [DOI] [PubMed] [Google Scholar]
  • 128.Ali F., Hayat M. Classification of membrane protein types using voting feature interval in combination with Chou’s pseudo amino acid composition. J. Theor. Biol. 2015;384:78–83. doi: 10.1016/j.jtbi.2015.07.034. [DOI] [PubMed] [Google Scholar]
  • 129.Ahmad K., Waris M., Hayat M. Prediction of protein submitochondrial locations by incorporating dipeptide composition into Chou’s general pseudo amino acid composition. J. Membr. Biol. 2016;249:293–304. doi: 10.1007/s00232-015-9868-8. [DOI] [PubMed] [Google Scholar]
  • 130.Ju Z., Cao J.Z., Gu H. Predicting lysine phosphoglycerylation with fuzzy SVM by incorporating k-spaced amino acid pairs into Chou's general PseAAC. J. Theor. Biol. 2016;397:145–150. doi: 10.1016/j.jtbi.2016.02.020. [DOI] [PubMed] [Google Scholar]
  • 131.Behbahani M., Mohabatkar H., Nosrati M. Analysis and comparison of lignin peroxidases between fungi and bacteria using three different modes of Chou’s general pseudo amino acid composition. J. Theor. Biol. 2016;411:1–5. doi: 10.1016/j.jtbi.2016.09.001. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Supplemental Materials and Methods
mmc1.pdf (1.7MB, pdf)
Document S2. Article plus Supplemental Information
mmc2.pdf (2.5MB, pdf)

Articles from Molecular Therapy. Nucleic Acids are provided here courtesy of The American Society of Gene & Cell Therapy

RESOURCES