Skip to main content
BioMed Research International logoLink to BioMed Research International
. 2013 Dec 22;2013:674215. doi: 10.1155/2013/674215

Prediction of Substrate-Enzyme-Product Interaction Based on Molecular Descriptors and Physicochemical Properties

Bing Niu 1, Guohua Huang 2,3, Linfeng Zheng 4, Xueyuan Wang 1, Fuxue Chen 1, Yuhui Zhang 5,*, Tao Huang 6,*
PMCID: PMC3881445  PMID: 24455714

Abstract

It is important to correctly and efficiently predict the interaction of substrate-enzyme and to predict their product in metabolic pathway. In this work, a novel approach was introduced to encode substrate/product and enzyme molecules with molecular descriptors and physicochemical properties, respectively. Based on this encoding method, KNN was adopted to build the substrate-enzyme-product interaction network. After selecting the optimal features that are able to represent the main factors of substrate-enzyme-product interaction in our prediction, totally 160 features out of 290 features were attained which can be clustered into ten categories: elemental analysis, geometry, chemistry, amino acid composition, predicted secondary structure, hydrophobicity, polarizability, solvent accessibility, normalized van der Waals volume, and polarity. As a result, our predicting model achieved an MCC of 0.423 and an overall prediction accuracy of 89.1% for 10-fold cross-validation test.

1. Introduction

With the completion of gene sequencing projects, scientific focus is shifting from the investigation of the proteomics to metabonomics which is of chemical processes involving metabolites. Metabolism consists of almost all of the chemical-chemical reactions or chemical-macromolecules reactions that generally take place within metabolic pathway [1]. Above linked individual interactions form the whole metabolic pathway and interaction network which produce more new complex and higher order structure [2]. Metabolic pathways are sequences of metabolic steps forming highly regulated networks of interacting enzymes and substrates. In metabolic pathways, the substrate is transformed through a series of steps into another chemical, by a sequence of enzymes. Given a substrate and an enzyme, people may wonder whether they can interact with each other or what is the product. Herein, network of interaction of substrate-enzyme-product can provide assistance in R&D of drug. For example, based on interaction of substrate-enzyme-product, maybe people can discover some candidate drug from nature product, and can even predict its potential side effect [3]. Besides this, network of interaction of substrate-enzyme-product can also be applied in evaluating the safety of research of Genetically Modified Food (GMF). By using the network of substrate-enzyme-product, the potential toxicity of product derived from GMF could be predicted. Hence, the interaction network of substrate-enzyme-product will provide us further knowledge and information beyond metabolic pathway.

Due to the complexity of metabolic pathways, it is both time-consuming and costly to determine the interaction of substrate-enzyme-product by experiments. It is in urgent to develop a quick, reliable, and effective approach to predict the interactions among substrate, enzyme, and product.

In this study, we reported a computational approach for predicting the network of substrate-enzyme-product triads based on K-nearest neighbor (KNN) [46] algorithm combined with mRMR-IFS feature selection method.

2. Methods and Materials

2.1. Methods

2.1.1. mRMR

Minimum Redundancy Maximum Relevance (mRMR), proposed by Peng et al., is an effective feature-selection method for evaluating the worth of an attribute by considering the minimum redundancy between attributes and the maximum relevance between attributes and targets [7]. More information of mRMR selection algorithm can be found in [7] and related studies [819].

2.1.2. KNN

K-nearest neighbors (KNN) is the most basic instance-based machine learning technique classifying objects based on cluster theory [46]. KNN recognizes a sample's class according to the label on the K-nearest neighbors. The nearest neighbors of an instance are defined by the Euclidean distance [4]. KNN has been widely applied in the field of biological sciences [2024]. More details about KNN can be referred to in [25, 26].

2.1.3. Incremental Feature Selection (IFS)

First, construct N feature subset by incrementally adding features to D as follows:

D0={f0},D1={f0,fi1},Di={f0,f1,fi},DN1={f0,f1,,fN1} (1)

(f i is the ith feature added into feature subset D).

Second, use KNN method to build the prediction model based on subset D i and evaluate the model by cross-validation. Then, a classification accuracy curve called IFS curve is attained.

2.2. Materials

2.2.1. Data Preparation

In this study, 14,229 compounds derived from database KEGG (http://www.genome.jp/kegg/) (release 42 in 2006) [27] were collected. After removing the compounds which do not participate in any metabolic reactions which have been supported by experiments, 1326 compounds and 939 enzyme molecules of the human genome participating metabolic reaction were obtained (please refer to Supplemental Material available online at http://dx.doi.org/10.1155/2013/674215).

In metabolic pathway, each substrate binds to one or more enzymes, but the production may not be different. Therefore, substrates and enzymes are subject to be involved in a network of interactions. In this study, substrate, enzyme, and product in each interaction are defined as a positive sample; and those that cannot interact with each other or those interactions that cannot attain the product are defined as negative samples. Triads in the positive set are termed as networking triads, and those in the negative set as nonnetworking triads. These networking triads are supported by solid experiments with 100% credibility by KEGG. As a result, 14,592 networking triads were obtained. To generate the negative datasets, firstly, we built a dataset by randomly combining two small molecules and an enzyme together; then, we removed the 14,592 networking triads. It should be mentioned that although some nonnetworking triads may not be true nonnetworking triads by chance in negative database set, the chance is small. Therefore, the credibility of the negative dataset is also very high. To reflect that the number of networking triads is much less than that of the nonnetworking triads, the negative samples of training set were generated 50 times as many as the positive ones. As a result, the final training dataset contains 14,592 networking triads and 729,600 nonnetworking triads (please refer to supplemental material II and III for the data).

2.2.2. Representation of Compounds

In developing a method for predicting drug-protein interaction, the first problem is how to describe this networking triad correctly as input for the prediction program. It is obvious that the performances of prediction model depend mostly on the features used to describe the molecular structures. In this study, molecular descriptors were applied to reflect the physicochemical and geometric properties of substrates and products which have been applied in our previous studies [2830]. The values of these molecular descriptors were calculated by program ChemAxon which is available for computing the molecular descriptors [31, 32] (see supplemental material IV). As some molecular descriptors cannot be calculated for some compounds, finally totally 79 molecular descriptors are used in building the model. Before calculating molecule descriptors, the compounds' three-dimensional structures were optimized by using MM+ force field with the Polak-Ribiere algorithm until the root-mean-square gradient became less than 0.1 Kcal/mol. Then, the descriptors were calculated under stable conformation of each molecule based on AM1 semiempirical molecular orbital method at the restricted Hartree-Fock level with no configuration interaction.

2.2.3. Representation of Enzymes

As each protein has its own physicochemical properties, like hydrophobicity, polarizability, and so vent accessibility, it is a good method to describe a protein sequence, and it has been employed for predicting various protein attributes. In this paper, the enzymes are encoded by 132 physicochemical descriptors (amino acid composition, predicted secondary structure, hydrophobicity, polarizability, solvent accessibility, normalized van der Waals volume, and polarity) [3338] (see supplemental material V) due to its effective and selective ability in the prediction of protein characteristics. More details can be seen in reference [3338] or our previous study [39].

2.3. Accuracy Measure

Generally speaking, the prediction performance of different discriminative methods is commonly evaluated by the function of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). In this study, we employed sensitivity (SN = TP/[TP + FN]), specificity (SP = TN/[TN + FP]), overall accuracy (ACC = [TP + TN]/[TP + TN + FP + FN]), and Matthew's correlation coefficient (MCC) to measure the prediction. The MCC can be represented as

MCC=TP×TNFP×FN(TN+FN)×(TN+FP)×(TP+FN)×(TP+FP). (2)

3. Results

In the recent years, many efforts have been made in feature selection [4046]. In this study, mRMR method was applied to search for a subset with optimal features. After mRMR calculation, two tables are attained (see supplemental material VI). One is called MaxRel feature table that ranks the features based on their relevance to the class of samples and the other is called mRMR feature table that lists the ranked features by the maximum relevance and minimum redundancy to the class of samples.

Then, IFS method is applied based on mRMR feature table. From Figure 1, it can be found that while adding new feature continually, the value of MCC increased, although during this process, the value of MCC decreased at some point. While the number of features reaches 160, the value of MCC is 0.423, the highest point. Then, the value of MCC begins to decrease. Hence, the subset containing these 160 features is considered as an optimal subset which is derived from original data set containing 290 features. These features selected are irrelevant to each other but relevant to the target.

Figure 1.

Figure 1

The curve of the 290 prediction models using IFS.

Based on the 160 features, predicting model of network of substrate-enzyme-production interaction could be built.

Ten folds cross-validation test, which is applied in many other applications [36, 4752], is adopted in this study to validate the model's prediction accuracy. During 10-fold cross-validation test, the datasets are divided into 10-folds, a model is built with N-1 fold samples and the 10th fold data are treated as unseen data, which is used for the prediction as the testing data. Each fold is left out from building the model and predicted in turn. The predictive ability is evaluated by averaging the correct prediction rates of the 10-fold data. Table 1 lists the prediction results while using KNN method.

Table 1.

Prediction accuracies of different dataset with KNN.

Dataset 10-folds cross-validation test
SN (%) SP (%) ACC (%) MCC
Original dataset 53.71 92.4 88.9 0.412
Optimal dataset 55.2 92.4 89.1 0.423

To evaluate our feature selection method, we compared the prediction results generated by final optimal subset and the original data set with 10-folds cross validation test (see Table 1). Table 1 shows that the prediction results of the 10-folds cross-validation test improved after applying feature selection. This demonstrates that maybe some features are redundant and interfering to each other in the original dataset; hence, it is better to remove some of them. Furthermore, the number of features in the final subsets is 55% of the original feature set. This result suggests that mRMR feature selection approach could make a good optimization and improve the accuracy of prediction for substrate-enzyme-product interaction.

4. Discussion

The selected 160 features in the final subset can be clustered into the following ten categories: elemental analysis, geometry, chemistry, amino acid composition, predicted secondary structure, hydrophobicity, polarizability, solvent accessibility, normalized van der Waals volume, and polarity (see Figure 2). The former three kind features are molecular descriptors which are of substrate and product, and the left seven kind features are of enzyme.

Figure 2.

Figure 2

Feature distribution.

According to the distribution of features of compounds (substrate and product) and enzymes, it shows that enzymes contribute more to the interaction process. Further calculating the proposition of the selected features to the original features, it is found that the proposition of enzyme feature (92/132 = 0.70) is higher than the proposition of compound feature (70/158 = 0.44). Table 2 also shows that several enzyme features are in the top ten and top twenty features. This result suggests that enzyme-centric features make more contributions to our proposed interactions network of substrate-enzyme-product.

Table 2.

Top 80 features rank according to their correlation to target.

No. Name Categories No. Name Categories
1 Polarity Polarity 41 Amino Acids Composition Cys Amino acids composition
2 Substrate_
Polarizability
Chemical 42 Polarizability Polarizability
3 Solvent accessibility Solvent accessibility 43 Polarizability Polarizability
4 Solvent accessibility Solvent accessibility 44 Amino Acids Composition Ile Amino acids composition
5 Secondary structure Secondary structure 45 Hydrophobicity Hydrophobicity
6 Normalized Van Der Waals volume Normalized Van Der Waals volume 46 Secondary structure Secondary structure
7 Normalized Van Der Waals volume Normalized Van Der Waals volume 47 Substrate_Stereo
DoubleBondCount
Geometry
8 Secondary structure Secondary structure 48 Normalized Van Der Waals volume Normalized Van Der Waals volume
9 Secondary structure Secondary structure 49 Substrate_Smallest
RingSystemSize
Geometry
10 Substrate_
LogP
Chemical 50 Substrate_Smallest
RingSize
Geometry
11 Substrate_
CComposition
Elemental analysis 51 Substrate_Rotatable
BondCount
Geometry
12 Amino Acids Composition Asn Amino acids composition 52 Substrate_H
Composition
Elemental analysis
13 Polarity Polarity 53 Amino Acids Composition Thr Amino acids composition
14 Hydrophobicity Hydrophobicity 54 Polarizability Polarizability
15 Substrate_MinZ Geometry 55 Amino Acids Composition Leu Amino acids composition
16 Solvent accessibility Solvent accessibility 56 Amino Acids Composition His Amino acids composition
17 Polarity Polarity 57 Substrate_CarboAliphatic
RingCount
Geometry
18 Hydrophobicity Hydrophobicity 58 Product_HComposition Elemental analysis
19 Substrate_VanDerWaals
SurfaceArea
Chemical 59 Polarizability Polarizability
20 Amino Acids Composition Asp Amino acids composition 60 Normalized Van Der Waals volume Normalized Van Der Waals volume
21 Hydrophobicity Chemical 61 Amino Acids Composition Gln Amino acids composition
22 Substrate_
OComposition
Elemental analysis 62 Normalized Van Der Waals volume Normalized Van Der Waals volume
23 Solvent accessibility Solvent accessibility 63 Polarizability Polarizability
24 Secondary structure Secondary structure 64 Amino Acids Composition Lys Amino acids Composition
25 Amino Acids Composition Ser Amino acids composition 65 Polarizability Polarizability
26 Substrate_Water
AccessibleSurface
Area Negative
Chemical 66 Amino Acids Composition Tyr Amino acids composition
27 Secondary structure Secondary structure 67 Amino Acids Composition Arg Amino acids composition
28 Hydrophobicity Hydrophobicity 68 Secondary structure Secondary structure
29 Substrate_FusedRingCount Geometry 69 Polarizability Polarizability
30 Substrate_Carbo
RingCount
Geometry 70 Normalized Van Der Waals volume Normalized Van Der Waals volume
31 Amino Acids Composition Glu Amino acids composition 71 Polarity Polarity
32 Hydrophobicity Hydrophobicity 72 Normalized Van Der Waals volume Normalized Van Der Waals volume
33 Polarizability Polarizability 73 Product_NComposition Elemental analysis
34 Polarity Polarity 74 Solvent accessibility Solvent accessibility
35 Normalized Van Der Waals volume Normalized Van Der Waals volume 75 Product_Hetero
AliphaticRingCount
Geometry
36 Substrate_Fused
Aliphatic
RingCount
Geometry 76 Substrate_CarboAromatic
RingCount
Geometry
37 Polarizability Polarizability 77 Substrate_PComposition Elemental analysis
38 Secondary structure Secondary structure 78 Hydrophobicity Hydrophobicity
39 Substrate_RingCount Geometry 79 Product_CComposition Elemental analysis
40 Amino Acids Composition Pro Amino acids composition 80 Normalized Van Der Waals volume Normalized Van Der Waals volume

From Table 2, it can be further found that for compound features, there are much less features of product than features of substrate and enzyme in the top fifty features. This is because during the interaction of substrate-enzyme-product, substrate and enzyme determine the products, and changing substrate or enzyme could result in a different product.

According to the distribution of features in Figure 2, it can be found that the number of geometry features is more than that of the other kind features. In this regard, geometry features have great effect and contribute to the substrate-enzyme-product interaction not only in substrate features but also in product features. However, from MaxRel feature table, it can be found that there are not many geometry features appearing in the top ten features. Therefore, we feel interesting of this problem. Actually, the order of geometry features is not incompatible with its distribution. Geometry features contain information of the structure of a molecule like the volume, size, and shape which leads to steric hindrance and steric resistance. These factors are of great importance in substrate-enzyme-product interaction. Only correctly three-dimensional size and shape molecule can interact with enzyme according to the Lock and Key Theory. Meanwhile, steric hindrance or steric resistance affect the substrate-enzyme-products' interaction as some big functional groups like aromatic ring prevent interaction. On the other hand, these functional groups also provide key interactive force to enzyme like heteroaromatics ring's π-π stacking interaction to enzyme's functional site. The substrates and products are varied and diverse greatly in structure. And it is difficult to describe their structure with only one or two descriptors. Hence, more geometry features could better extract the information of compounds' structure. This is why though single geometry feature has no strong relevance to the interaction, the overall contribution of the forty-four geometry feature can often be crucial to the interaction.

Figure 2 also shows that amino acid compositions and second structure occupied important propositions among the ten types' features. Amino acid composition in the binding site contributes a lot in substrate-enzyme-product interaction because it could affect the state energy. Some experiments have verified the importance for amino acid compositions in protein related interaction [5355]. For example, Tyr265 plays a central role in enzyme alanine racemase's binding to L-alanine and pyridoxal 5-phosphate [54]. Hence, for a unique structure, the amino acid composition plays the essential role in the interactions. Secondary structure is considered as an important property in many protein related problems, since the shape and biological function of a protein are mainly determined by its secondary structures. Secondary structure features reflect the steric structure of protein. According to the Lock and Key Theory, the size and shape of substrate were rigid and restricted by enzyme. Accordingly, secondary structure has relatively more impact on the determination of substrate and product.

5. Conclusion

In this paper, a feature selection method called mRMR combined with IFS was applied to dataset of substrate-enzyme-product interaction which is encoded with molecular descriptors of substrate/product and 132 physicochemical protein descriptors. As a result, we find that enzymes are essential in substrate-enzyme-product interaction; 160 important features were abstracted from 290 features. Based on the above findings, we also used KNN method to build a prediction model of substrate-enzyme product interaction. Based on the prediction results, it is expected that molecular descriptors and 132 physicochemical protein descriptors can be served as an efficient coding method for network of substrate-enzyme-product interaction.

Supplementary Material

Supplementary Material 1 The ID of 1326 compounds and 939 enzymes of metabolic pathway in Kegg database.

Supplementary Material 214,592 networking triads of substrate-enzyme-product.

Supplementary Material 3: 729,600 non-networking triads of substrate-enzyme-product.

Supplementary Material 4: Calculated value of molecular descriptors of 1326 compounds and the categories.

Supplementary Material 5: The value of 132 physicochemical descriptors for 939 enzyme.

Supplementary Material 6: The MaxRel and mRMR feature tables using mRMR method.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors would like to thank the National Natural Science Foundation of China (20973108, 81271384), the Natural Science Foundation of Shanghai Science and Technology Commission (12ZR1424900), and the Shanghai Key Laboratory of Bio-energy Crops (13DZ2272100) for their financial support. We also acknowledge ChemAxon for their excellent products.

References

  • 1.Papin JA, Price ND, Wiback SJ, Fell DA, Palsson BO. Metabolic pathways in the post-genome era. Trends in Biochemical Sciences. 2003;28(5):250–258. doi: 10.1016/S0968-0004(03)00064-1. [DOI] [PubMed] [Google Scholar]
  • 2. Barabási A-L, Oltvai ZN. Network biology: understanding the cell's functional organization. Nature Reviews Genetics. 2004;5(2):101–113. doi: 10.1038/nrg1272. [DOI] [PubMed] [Google Scholar]
  • 3.Reichard P. Ribonucleotide reductases: the evolution of allosteric regulation. Archives of Biochemistry and Biophysics. 2002;397(2):149–155. doi: 10.1006/abbi.2001.2637. [DOI] [PubMed] [Google Scholar]
  • 4.Huberty CJ. Applied Discriminant Analysis. Vol. 297. New York, NY, USA: John Wiley & Sons; 1994. [Google Scholar]
  • 5.Fix E, Hodges JL. Discriminatory analysis. Nonparametric discrimination: consistency properties. USAF School of Aviation Medicine: Randolph Field, pp. 261-279, San Antonio, Tex, USA, 1951.
  • 6.Johnson RA, Wichern DW. Applied MultiVariate Statistical Analysis. 5th edition. Englewood Cliffs, NJ, USA: Prentice Hall; 1982. [Google Scholar]
  • 7.Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2005;27(8):1226–1238. doi: 10.1109/TPAMI.2005.159. [DOI] [PubMed] [Google Scholar]
  • 8.Niu B, Lu L, Liu L, et al. HIV-1 protease cleavage site prediction based on amino acid property. Journal of Computational Chemistry. 2009;30(1):33–39. doi: 10.1002/jcc.21024. [DOI] [PubMed] [Google Scholar]
  • 9.Cai Y, He J, Li X, et al. Prediction of protein subcellular locations with feature selection and analysis. Protein and Peptide Letters. 2010;17(4):464–472. doi: 10.2174/092986610790963654. [DOI] [PubMed] [Google Scholar]
  • 10.Cai Y, He Z, Shi X, Kong X, Gu L, Xie L. A novel sequence-based method of predicting protein DNA-binding residues, using a machine learning approach. Molecules and Cells. 2010;30(2):99–105. doi: 10.1007/s10059-010-0093-0. [DOI] [PubMed] [Google Scholar]
  • 11.Cai Y, Huang T, Hu L, Shi X, Xie L, Li Y. Prediction of lysine ubiquitination with mRMR feature selection and analysis. Amino Acids. 2012;42(4):1387–1395. doi: 10.1007/s00726-011-0835-0. [DOI] [PubMed] [Google Scholar]
  • 12.Chen L, He Z-S, Huang T, Cai Y-D. Using compound similarity and functional domain composition for prediction of drug-target interaction networks. Medicinal Chemistry. 2010;6(6):388–395. doi: 10.2174/157340610793563983. [DOI] [PubMed] [Google Scholar]
  • 13.Hu L-L, Niu S, Huang T, Wang K, Shi X-H, Cai Y-D. Prediction and analysis of protein hydroxyproline and hydroxylysine. PLoS One. 2010;5(12) doi: 10.1371/journal.pone.0015917.e15917 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Li B, Feng K, Chen L, Huang T, Cai Y. Prediction of protein-protein interaction sites by random forest algorithm with mRMR and IFS. PLoS One. 2012;7 doi: 10.1371/journal.pone.0043927.e43927 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Li B-Q, Hu L-L, Niu S, Cai Y-D, Chou K-C. Predict and analyze S-nitrosylation modification sites with the mRMR and IFS approaches. Journal of Proteomics. 2012;75(5):1654–1665. doi: 10.1016/j.jprot.2011.12.003. [DOI] [PubMed] [Google Scholar]
  • 16.Li B-Q, Huang T, Liu L, Cai Y-D, Chou K-C. Identification of colorectal cancer related genes with mrmr and shortest path in protein-protein interaction network. PLoS One. 2012;7(4) doi: 10.1371/journal.pone.0033393.e33393 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Lin K, Qian Z, Lu L, et al. Predicting miRNA's target from primary structure by the nearest neighbor algorithm. Molecular Diversity. 2010;14(4):719–729. doi: 10.1007/s11030-009-9216-y. [DOI] [PubMed] [Google Scholar]
  • 18.Niu S, Huang T, Feng K, Cai Y, Li Y. Prediction of tyrosine sulfation with mRMR feature selection and analysis. Journal of Proteome Research. 2010;9(12):6490–6497. doi: 10.1021/pr1007152. [DOI] [PubMed] [Google Scholar]
  • 19.Yuan Y, Shi X, Li X, et al. Prediction of interactiveness of proteins and nucleic acids based on feature selections. Molecular Diversity. 2010;14(4):627–633. doi: 10.1007/s11030-009-9198-9. [DOI] [PubMed] [Google Scholar]
  • 20.Cai Y-D, Chou K-C. Predicting subcellular localization of proteins in a hybridization space. Bioinformatics. 2004;20(7):1151–1156. doi: 10.1093/bioinformatics/bth054. [DOI] [PubMed] [Google Scholar]
  • 21.Cai Y-D, Doig AJ. Prediction of Saccharomyces cerevisiae protein functional class from functional domain composition. Bioinformatics. 2004;20(8):1292–1300. doi: 10.1093/bioinformatics/bth085. [DOI] [PubMed] [Google Scholar]
  • 22.Chou K-C, Cai Y-D. Predicting protein-protein interactions from sequences in a hybridization space. Journal of Proteome Research. 2006;5(2):316–322. doi: 10.1021/pr050331g. [DOI] [PubMed] [Google Scholar]
  • 23.Qian Z, Cai Y-D, Li Y. A novel computational method to predict transcription factor DNA binding preference. Biochemical and Biophysical Research Communications. 2006;348(3):1034–1037. doi: 10.1016/j.bbrc.2006.07.149. [DOI] [PubMed] [Google Scholar]
  • 24.Song J. Prediction of homo-oligomeric proteins based on nearest neighbour algorithm. Computers in Biology and Medicine. 2007;37(12):1759–1764. doi: 10.1016/j.compbiomed.2007.05.002. [DOI] [PubMed] [Google Scholar]
  • 25.Cover TM, Hart PE. Nearst neighbor pattem classlfication. IEEE Transactions on Information Theory. 1967;13(1):21–27. [Google Scholar]
  • 26.Friedman JH, Baskett F, Shustek LJ. An algorithm for finding nearest neighbors . IEEE Transactions on Computers. 1975;C-24(10):1000–1006. [Google Scholar]
  • 27.Goto S, Nishioka T, Kanehisa M. LIGAND: chemical database for enzyme reactions. Bioinformatics. 1998;14(7):591–599. doi: 10.1093/bioinformatics/14.7.591. [DOI] [PubMed] [Google Scholar]
  • 28.Niu B, Gu L, Peng CR, Ding J, Yuan XC, Lu WC. Small molecules' multi-metabolic pathways prediction using physico-chemical features and multi-task learning method. Current Bioinformatics. 2013;8(5):564–568. [Google Scholar]
  • 29.Peng C-R, Lu W-C, Niu B, Li M-J, Yang X-Y, Wu M-L. Predicting the metabolic pathways of small molecules based on their physicochemical properties. Protein & Peptide Letters. 2012;19(12):1250–1256. doi: 10.2174/092986612803521585. [DOI] [PubMed] [Google Scholar]
  • 30.Peng C-R, Lu W-C, Niu B, Li Y-J, Hu L-L. Prediction of the functional roles of small molecules in lipid metabolism based on ensemble learning. Protein & Peptide Letters. 2012;19(1):108–112. doi: 10.2174/092986612798472802. [DOI] [PubMed] [Google Scholar]
  • 31.Weber L. JChem Base—ChemAxon. Chemistry World-Uk. 2008;5(10):65–66. [Google Scholar]
  • 32.Csizmadia F. JChem: java applets and modules supporting chemical database handling from web browsers. Journal of Chemical Information and Computer Sciences. 2000;40(2):323–324. doi: 10.1021/ci9902696. [DOI] [PubMed] [Google Scholar]
  • 33.Dubchak I, Muchnik I, Mayor C, Dralyuk I, Kim S-H. Recognition of a protein fold in the context of the SCOP classification. Proteins: Structure, Function and Genetics. 1999;35(4):401–407. [PubMed] [Google Scholar]
  • 34.Chothia C, Finkelstein AV. The classification and origins of protein folding patterns. Annual Review of Biochemistry. 1990;59:1007–1039. doi: 10.1146/annurev.bi.59.070190.005043. [DOI] [PubMed] [Google Scholar]
  • 35.Frishman D, Argos P. Seventy-five percent accuracy in protein secondary structure prediction. Proteins: Structure, Function and Genetics. 1997;27(3):329–335. doi: 10.1002/(sici)1097-0134(199703)27:3<329::aid-prot1>3.0.co;2-8. [DOI] [PubMed] [Google Scholar]
  • 36.Mucchielli-Giorgi MH, Hazout S, Tufféry P. PredAcc: prediction of solvent accessibility. Bioinformatics. 1999;15(2):176–177. doi: 10.1093/bioinformatics/15.2.176. [DOI] [PubMed] [Google Scholar]
  • 37.Dubchak I, Muchnik I, Holbrook SR, Kim S-H. Prediction of protein folding class using global description of amino acid sequence. Proceedings of the National Academy of Sciences of the United States of America. 1995;92(19):8700–8704. doi: 10.1073/pnas.92.19.8700. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Dubchak I, Muchnik I, Mayor C, Dralyuk I, Kim SH. Recognition of a protein fold in the context of the structural classification of proteins (SCOP) classification. Proteins. 1999;35(4):401–407. [PubMed] [Google Scholar]
  • 39.Niu B, Jin Y, Lu L, et al. Prediction of interaction between small molecule and enzyme using AdaBoost. Molecular Diversity. 2009;13(3):313–320. doi: 10.1007/s11030-009-9116-1. [DOI] [PubMed] [Google Scholar]
  • 40.He Z, Zhang J, Shi X-H, et al. Predicting drug-target interaction networks based on functional groups and biological features. PLoS One. 2010;5(3) doi: 10.1371/journal.pone.0009603.e9603 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Hu L, Huang T, Shi X, Lu W-C, Cai Y-D, Chou K-C. Predicting functions of proteins in mouse based on weighted protein-protein interaction network and protein hybrid properties. PLoS One. 2011;6(1) doi: 10.1371/journal.pone.0014556.e14556 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Huang T, Shi X-H, Wang P, et al. Analysis and prediction of the metabolic stability of proteins based on their sequential features, subcellular locations and interaction networks. PloS One. 2010;5(6) doi: 10.1371/journal.pone.0010972.e10972 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Wang P, Hu L, Liu G, et al. Prediction of antimicrobial peptides based on sequence alignment and feature selection methods. PLoS One. 2011;6(4) doi: 10.1371/journal.pone.0018476.e18476 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Chen L, Feng K-Y, Cai Y-D, Chou K-C, Li H-P. Predicting the network of substrate-enzyme-product triads by combining compound similarity and functional domain composition. BMC Bioinformatics. 2010;11(article 293) doi: 10.1186/1471-2105-11-293. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Chen L, Huang T, Shi X-H, Cai Y-D, Chou K-C. Analysis of protein pathway networks using hybrid properties. Molecules. 2010;15(11):8177–8192. doi: 10.3390/molecules15118177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Chou KC, Shen H-B. Review: recent advances in developing web-servers for predicting protein attributes. Natural Science. 2009;1(2):63–92. [Google Scholar]
  • 47.Creighton TE. Proteins: Structures and Molecular Properties. 2nd edition. New York, NY, USA: W. H. Freeman; 1993. [Google Scholar]
  • 48.Tusnády GE, Simon I. Principles governing amino acid composition of integral membrane proteins: application to topology prediction. Journal of Molecular Biology. 1998;283(2):489–506. doi: 10.1006/jmbi.1998.2107. [DOI] [PubMed] [Google Scholar]
  • 49.Freund Y, Mansour Y, Schapire RE. Generalization bounds for averaged classifiers. Annals of Statistics. 2004;32(4):1698–1722. [Google Scholar]
  • 50.Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences. 1997;55(1):119–139. [Google Scholar]
  • 51.Schapire RE, Freund Y, Bartlett P, Lee WS. Boosting the margin: a new explanation for the effectiveness of voting methods. Annals of Statistics. 1998;26(5):1651–1686. [Google Scholar]
  • 52.Schapire RE, Singer Y. Improved boosting algorithms using confidence-rated predictions. Machine Learning. 1999;37(3):297–336. [Google Scholar]
  • 53.Yudushkin IA, Schleifenbaum A, Kinkhabwala A, Neel BG, Schultz C, Bastiaens PIH. Live-cell imaging of enzyme-substrate interaction reveals spatial regulation of PTP1B. Science. 2007;315(5808):115–119. doi: 10.1126/science.1134966. [DOI] [PubMed] [Google Scholar]
  • 54.Ondrechen MJ, Briggs JM, McCammon JA. A model for enzyme-substrate interaction in alanine racemase. Journal of the American Chemical Society. 2001;123(12):2830–2834. doi: 10.1021/ja0029679. [DOI] [PubMed] [Google Scholar]
  • 55.Sadasivan C, Yee VC. Interaction of the factor XIII activation peptide with α-thrombin. Crystal structure of its enzyme-substrate analog complex. Journal of Biological Chemistry. 2000;275(47):36942–36948. doi: 10.1074/jbc.M006076200. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 The ID of 1326 compounds and 939 enzymes of metabolic pathway in Kegg database.

Supplementary Material 214,592 networking triads of substrate-enzyme-product.

Supplementary Material 3: 729,600 non-networking triads of substrate-enzyme-product.

Supplementary Material 4: Calculated value of molecular descriptors of 1326 compounds and the categories.

Supplementary Material 5: The value of 132 physicochemical descriptors for 939 enzyme.

Supplementary Material 6: The MaxRel and mRMR feature tables using mRMR method.


Articles from BioMed Research International are provided here courtesy of Wiley

RESOURCES