Abstract
With the rapid advance in genomics, proteomics, metabolomics, and other types of omics technologies during the past decades, a tremendous amount of data related to molecular biology has been produced. It is becoming a big challenge for the bioinformatists to analyze and interpret these data with conventional intelligent techniques, for example, support vector machines. Recently, the hybrid intelligent methods, which integrate several standard intelligent approaches, are becoming more and more popular due to their robustness and efficiency. Specifically, the hybrid intelligent approaches based on evolutionary algorithms (EAs) are widely used in various fields due to the efficiency and robustness of EAs. In this review, we give an introduction about the applications of hybrid intelligent methods, in particular those based on evolutionary algorithm, in bioinformatics. In particular, we focus on their applications to three common problems that arise in bioinformatics, that is, feature selection, parameter estimation, and reconstruction of biological networks.
1. Introduction
During the past decade, large amounts of biological data have been generated thanks to the development of high-throughput technologies. For example, 1,010,482 samples were profiled and deposited in Gene Expression Omnibus (GEO) database [1] by the writing of this paper, where around thousands of genes on average were measured for each sample. The recently released pilot data from the 1000 genomes project indicate that there are 38 million SNPs (single-nucleotide polymorphism) and 1.4 million biallelic indels within the 14 populations investigated [2]. Beyond that, other large-scale omics data, for example, RNA sequencing and proteomics data, can be found in public databases and are being generated everyday around the world. Despite the invaluable knowledge hidden in the data, unfortunately, the analysis and interpretation of these data lag far behind data generation.
It has been a long history that intelligent methods from artificial intelligence were widely used in bioinformatics, where these approaches were utilized to analyze and interpret the big datasets that cannot be handled by biologists. For example, in their pioneering work, Golub et al. utilized self-organizing maps (SOMs) to discriminate acute myeloid leukemia (AML) from acute lymphoblastic leukemia (ALL) based only on gene expression profiles without any prior knowledge [3]. Later, support vector machine was employed to classify 14 tumor types based on microarray gene expression data [4]. Except for diagnosis, intelligent methods have been exploited to identify biomarkers [5], annotate gene functions [6], predict drug targets [7, 8], and reverse engineering signaling pathways [9], among others.
Despite the success achieved by standard intelligent methods, it is becoming evident that it is intractable to analyze the large-scale omics data with only single standard intelligent approaches. For example, when diagnosing cancers based on gene expression profiles, low accuracy is expected if a traditional classifier, for example, linear discriminant analysis (LDA), is employed to classify the samples based on all the genes measured. This phenomenon is caused due to the “large p small n” paradigm which arises in microarray data, where there are generally around 20 thousand of genes or variables that were measured for each sample while only tens or at most hundreds of samples were considered in each experiment. In other words, there are very few samples while a much larger number of variables are to be learned by the intelligent methods, that is, the curse of dimensionality problem. Therefore, it is necessary to employ other intelligent techniques to select a small number of informative features first, based on which a classifier can be constructed to achieve the desired prediction accuracy. Such hybrid intelligent methods, that is, the combination of several traditional intelligent approaches, are being proved useful in analyzing the big complex biological data and are therefore becoming more and more popular.
In this paper, we survey the applications of hybrid intelligent methods in bioinformatics, which can help the researchers from both fields to understand each other and boost their future collaborations. In particular, we focus on the hybrid methods based on evolutionary algorithm due to its popularity in bioinformatics. We introduce the applications of hybrid intelligent methods to three common problems that arise in bioinformatics, that is, feature selection, parameter estimation, and molecular network/pathway reconstruction.
2. Evolutionary Algorithm
In this section, we first briefly introduced evolutionary algorithm, which is actually a family of algorithms inspired by the evolutionary principles in nature. In the evolutionary algorithm family, there are various variants, such as genetic algorithm (GA) [10, 11], genetic programming (GP) [12], evolutionary strategies (ES) [13], evolutionary programming (EP) [14], and differential evolution (DE) [15]. However, the principle underlying all these algorithms is the same that tries to find the optimal solutions by the operations of reproduction, mutation, recombination, and natural selection on a population of candidate solutions. In the following parts, we will take genetic algorithm (GA) as an example to introduce the evolutionary algorithm.
Figure 1 presents a schematic flowchart of genetic algorithm. In genetic algorithm, each candidate solution should be represented in an appropriate way that can be handled by the algorithm. For example, given a pool of candidate solutions X of size M, X = {x 1,x 2,…,x M}T, a candidate solution x i, that is, an individual, can be represented as a binary string x i = [0,0, 1,0,…, 1]. Take feature selection as an example; each individual represents a set of features to be selected, where element 1 in the individual means that the corresponding feature is selected and vice versa. After the representation of individuals is determined, a pool of initial solutions is generally randomly generated first.
To evaluate each individual in the candidate solution pool, a fitness function or evaluation function F is defined in the algorithm. The fitness function is generally defined by taking into account the domain knowledge and the optimal objective function to be solved. For instance, the prediction accuracy or classification error can be used as fitness function. If an individual leads to better fitness, it is a better solution and vice versa.
Once the fitness function is determined, the current population will go through two steps: selection and crossover and mutation. In selection step, a subset of individual solutions will be selected generally based on certain probability, and the selected solutions will be used as parents to breed next generation. In the next step, a pair of parent solutions will be picked from the selected parents to generate a new solution with crossover operation; meanwhile, mutation(s) can be optionally applied to certain element(s) within a parent individual to generate a new one. The procedure of crossover and/or mutation continues until a new population of solutions of similar size is generated.
The genetic algorithm repeats the above procedure until certain criterion is met; that is, the preset optimal fitness is found or a fixed number of generations are reached. Despite the common principles underlying the evolutionary algorithm family, other variants of the algorithm may have implementation procedures that are different from the genetic algorithm. For example, in differential evolution, the individuals are selected based on greedy criterion to make sure that all individuals in the new generation are better than or at least as good as the corresponding ones in current population. Another alternative of the traditional genetic algorithm, namely, memetic algorithm (MA), utilizes a local search technique to improve the fitness of each individual and reduce the risk of premature convergence.
Since the evolutionary algorithm starts with a set of random candidate solutions and evaluates multiple individuals at the same time, the risk of getting stuck in a local optimum is reduced. Furthermore, the evolutionary algorithm can generally find optimal solutions within reasonable time, thereby becoming a popular technique in various fields.
3. Feature Selection in Bioinformatics
In bioinformatics, various problems are equivalent to feature selection problem. For example, in bioinformatics, biomarker discovery is one important and popular topic that tries to identify certain markers, for example, genes or mutations, which can be used for disease diagnosis. It is obvious that biomarker identification is equivalent to feature selection if we consider genes or mutations of interest as variables, where the informative genes or mutations are generally picked to discriminate disease samples from normal ones. However, it is not an easy task to select a few informative variables (generally <20) from thousands or even tens of thousands of features. Under the circumstances, the evolutionary algorithm has been widely adopted for identifying biomarkers along with other intelligent methods. Figure 2 depicts the procedure of feature selection with GA, where GA generally works together with a classifier as a wrapper method and the classifier is used to evaluate the selected features in each iteration. For example, Li et al. [16] utilized genetic algorithm and k-nearest neighbor (KNN) classifier to find discriminative genes that can separate tumors from normal samples based on gene expression data, and robust results were obtained by the hybrid GA/KNN method. Later, Jirapech-Umpai and Aitken [17] applied the GA/KNN approach to leukemia and NCI60 datasets, where the prediction results by the hybrid method are found to be consistent with clinical knowledge, indicating the effectiveness of the hybrid method. Since the simple genetic algorithm (SGA) often converges to a point in the search space, Goldberg and Holland adopted the speciated genetic algorithm, which controls the selection step by handling its fitness with the niching pressure, for gene selection along with artificial neural network (SGANN) [18]. Benchmark results show that SGANN reduces much more features than SGA and performs pretty well [19]. Recently, the hybrid approaches that, respectively, combined Pearson's correlation coefficient (CC) and Relief-F measures with GA were proposed by Chang et al. [20] to select the key features in oral cancer prognosis. These hybrid approaches outperform other popular techniques, such as adaptive neurofuzzy inference system (ANFIS), artificial neural network (ANN), and support vector machine (SVM). In addition to gene selection, the hybrid methods involving evolutionary algorithm have been successfully used to identify SNPs associated with diseases [21, 22] and peptides related to diseases from proteomic profiles [23–25].
Beyond biomarker identification, the evolutionary algorithm based hybrid intelligent methods have also been successfully applied to other feature selection problems in bioinformatics. For example, Zhao et al. [26] proposed a novel hybrid method based on GA and support vector machine (SVM) to select informative features from motif content and protein composition for protein classification, where the principal component analysis (PCA) was further used to reduce the dimensionality while GA was utilized to select a subset of features as well as optimize the regularization parameters of SVM at the same time. Results on benchmark datasets show that the hybrid method is really effective and robust. The hybrid method that integrates SVM and GA was also successfully used to select SNPs [27] and genes [28] associated with certain phenotypes and predict protein subnuclear localizations based on physicochemical composition features [29]. Recently, the hybrid SVM/GA approach was also utilized for selecting the optimum combinations of specific histone epigenetic marks to predict enhancers [30]. Saeys et al. predicted splice sites from nucleotide acid sequence by utilizing the hybrid method combining SVM and estimation of distribution algorithms (EDA) that is similar to GA [31]. Nemati et al. further combined GA and ant colony optimization (ACO) together for feature selection, and the hybrid method was found to outperform either GA or ACO alone when predicting protein functions [32]. In addition, Kamath et al. [33] proposed a feature generation with an evolutionary algorithm (FG-EA) approach, which employs a standard GP algorithm to explore the space of potentially useful features of sequence data. The features obtained from FG-EA enable the SVM classifier to get higher precision.
Feature selection is an important topic in bioinformatics and is involved in the analysis of various kinds of data. The hybrid methods that utilize the evolutionary algorithm have been proven useful for feature selection when handling the complex biological data due to their efficiency and robustness.
4. Parameter Estimation in Modeling Biological Systems
In bioinformatics, one biological system can be modeled as a set of ordinary differential equations (ODEs) so that the dynamics of the systems can be investigated and simulated. For example, Zhan and Yeung modeled a molecular pathway with the following ODEs [34]:
(1) |
where x ∈ R n is the state vector of the system, θ ∈ R k is a parameter vector, u(t) ∈ R p is the system's input, y ∈ R m is the measured data, η(t) ~ N(0, σ 2) is the Gaussian white noise, and x 0 denotes the initial state. f is designed as a set of nonlinear transition functions to represent the dynamical properties of the biological system and g is a measurement function. It can be seen that, to make the model work, it is necessary to estimate the parameters in the model, which can be transformed into an optimization problem as follows:
(2) |
where , ||·||l denotes the l-norm, is the variable at time t j with parameter , w ij denotes the weight, and means the estimated value. The problem P could be solved easily by employing the evolutionary algorithms [35–37]. For example, Katsuragi et al. [38] employed GA to estimate the parameters required by the simulation of dynamics of the metabolite concentrations, and Ueda et al. [39] applied the real-coded genetic algorithm to find the optimal values of the parameters. Recently, in order to improve the accuracy of parameter estimation, Abdullah et al. [40] proposed a novel approach that combines differential evolution (DE) with the firefly algorithm (FA), which outperformed other well-known approaches, such as particle swarm optimization (PSO) and Nelder-Mead algorithm.
In biological experiments, most data observed are measured at discrete time points while the traditional ODE model is a set of continuous equations, which makes it difficult to estimate the parameters in an accurate way. Therefore, the S-system, which is a type of power-law formalism and a particular type of ODE model, was widely used instead. For example, Savageau and Rosen [41] modeled the genetic network with the following S-system model:
(3) |
where X i denotes the variable or reactant, n and m, respectively, denote the number of dependent and independent variables, α i and β i are nonnegative rate constants, and g ij and h ij are kinetic orders. Here, the parameters α i, β i, g ij, and h ij must be estimated. To optimize the parameters, Tominaga and Okamoto [42] utilized GA to approach the optimization problem with the following evaluation function E:
(4) |
where T is the number of sampling points and X i(t) and X i′(t), respectively, denote experimentally observed and estimated value at time t for X i. Later, Kikuchi et al. [43] found that it is difficult to estimate all the parameters from limited time-course data of metabolite concentrations. Hence, they changed the evaluation function E as follows:
(5) |
where c is a penalty constant that balances the two evaluation terms. Moreover, they adopted the simplex operations [44] instead of the random ones to accelerate the searching in GA. Considering only a few genes affecting both the synthesis and degradation processes of specific genes, Noman and Iba [45] further simplified the evaluation function as follows:
(6) |
where K i,j is the kinetic order of gene i. With this objective function, they adopted a novel hybrid evolutionary algorithm, namely, memetic algorithm (MA) [46], that combines global optimization and local search together to find the optimal solutions. Considering that the traditional S-system can only describe instantaneous interactions, Chowdhury et al. [47] introduced the time-delay parameters to represent the system dynamics and refined the evaluation function as follows:
(7) |
where r i is the number of all actual regulators, B i is a balancing factor between the two terms, and C i is the penalty factor for gene i. The trigonometric differential evolution (TDE) technique was adopted to estimate the set of parameters because of its better performance than other traditional evolutionary algorithms.
Parameter estimation is a key step in mathematical modeling of biological systems, which is however a nontrivial task considering the possible huge search space. Due to its excellent searching capability, the evolutionary algorithm is able to help determine the model parameters along with other intelligent approaches.
5. Molecular Network/Pathway Reconstruction
Recently, the network biology that represents a biological system as a molecular network or graph is attracting more and more attention. In the molecular network, the nodes denote the molecules, for example, proteins and metabolites, while edges denote the interactions/regulations or other functional links between nodes. Although it is easy to observe the activity of thousands of molecules at the same time with high-throughput screening, it is not possible to detect the potential interactions/regulations between molecules right now.
Under the circumstances, a lot of intelligent methods have been presented to reconstruct the molecular networks, such as Boolean network and Bayesian network. When reconstructing the molecular networks, one critical step is to determine the topology of the network to be modeled, based on which the interactions/regulations between molecules can be investigated. The topology determination problem can be treated as an optimization problem that is ready to be solved with the help of the evolutionary algorithm.
Take a gene regulatory network as an example; Figure 3 shows the flowchart of reconstructing the regulatory network based on gene expression data by utilizing Boolean network and evolutionary algorithm. In the example, we want to reconstruct the regulatory circuit that controls the gene expression of five genes. Since at least one edge exists while at most 10 edges exist in the network, the number of possible network structures will be M = ∑i=1 10 C 10 i = 210 − 1 ≈ 210. It is impossible to validate all network topologies by biologists in lab. With appropriate fitness function, the evolutionary algorithm is able to identify the optimal network structure that fits best the gene expression data, where the consistence between network topology and gene expression data is evaluated with Boolean network based on certain rules.
Repsilber et al. [48] modeled the gene regulatory network with a Boolean model as a directed acyclic graph G = (V, F), where V = {x 1, x 2,…, x n} denotes the set of genes in the regulatory network and F = {f 1, f 2,…, f n} denotes the Boolean rules that describe the regulations between nodes (or genes). To determine the topology of the regulatory network that better fits the observed data, they employed GA with the following fitness function f:
(8) |
where δ ijk = (sim_dataijk − network_outputijk) is the difference between the observed data and those estimated from the generated network. In this way, they successfully reconstructed the gene regulatory network that generates the expression profiles consistent with experiments.
Later, Mendoza and Bazzan [49] presented inconsistency ratio (IR) to evaluate each individual node in the network, where the IR is defined as follows:
(9) |
Here, k = 1,2, 3,…, 2K is the number of possible input combinations for a node, w k(0) denotes the weight of measurements with output of 0 while w k(1) denotes those with output of 1, and w is the sum of all weights. With the IR defined above, an evaluation function defined below was used to investigate the inconsistency between the network generated and the experimental data:
(10) |
where N × 0.5 denotes the maximum inconsistency to be generated by the network while (NP/N 2) is a penalty factor. With this evaluation function, the differential evolution (DE) approach was used to identify the optimal network structure [50].
Recently, to understand the signaling in distinct physiological situations, Terfve et al. [51] proposed a CellNOptR approach, which derives a Boolean logic model from a “prior knowledge network” and uses GA to search the optimal network structure that is consistent with the perturbation data. Later, Crespo et al. [52] employed Boolean logic model and genetic algorithm to predict missing gene expression values from experimental data and obtained promising results.
Although the Boolean network is simple and capable of handling large networks, it fails to provide quantitative information about regulations between molecules, which is however the key to understand the regulation process. In this case, the Bayesian network is widely adopted. Considering the expensive computation time required by Bayesian network, the evolutionary algorithm is widely used to determine the structures of the molecular networks modeled. In the Bayesian network, the molecular network is regarded as a directed acyclic graph described as follows:
(11) |
where x i denotes node i in the set of variables, that is, the molecules considered, and π i denotes the parent node of x i. For example, Yu et al. [53] utilized GA to determine the optimal network structure consistent with experimental data along with the dynamic Bayesian network by defining an evaluation function based on Bayesian dirichlet equivalence (BDe) score and Bayesian information criterion (BIC) score. Later, Xing and Wu [54] employed the maximum likelihood (ML) score and the minimal description length (MDL) score as fitness values and determined the topology of gene regulatory networks with GA, where the regulatory network is modeled with Bayesian network. Recently, Li and Ngom [55] proposed a new high-order dynamic Bayesian network (HO-DBN) learning approach to identify genetic regulatory networks from gene expression time-series data and obtained the optimal structure of the networks with GA. In their method, the optimal structure was estimated by the maximum likelihood as follows:
(12) |
where X = {x 1, x 2,…, x n} and θ s = {θ 1, θ 2,…, θ n} is the parameter set.
In addition to Boolean and Bayesian networks, the Petri net [56] is also widely employed to reconstruct biological networks. For example, in the Petri net model of metabolic networks, the nodes named places denote metabolites or products while transitions representing reactions are edges, where the values accompanying transitions denote rate constants. The input places for a transition denote the reaction's reactants while the output places denote its products, and the value of a place can be represented by its corresponding amount of substance. If a transition is deleted, a reaction happens, in which reactants are consumed and products are yielded. To find the optimal solutions, Nummela and Juistrom [57] defined a fitness function F as follows:
(13) |
where c mi means the computed concentration of the mth metabolite at time i, c mi0 is the corresponding target concentration, n m means the number of metabolites, n p is the number of time steps, and n r is the number of reactions. With the hybrid method combining the Petri net and GA, they successfully identified a network that is consistent with the simulated data. Later, Koh et al. [58] have also successfully employed this hybrid method to model the AKt and MAPK signaling pathways.
The molecular networks enable one to investigate the biological systems from a systematic perspective, whereas the network topology is the key to construct and understand the network. Accumulating evidence demonstrates that the hybrid heuristic methods involving evolutionary algorithm are able to help determine the network topology consistent with experimental data in an accurate way due to its significant efficiency.
6. Conclusions
In this paper, we surveyed the applications of hybrid intelligent methods, which combine several traditional intelligent approaches together, in bioinformatics. Especially, we introduced the hybrid methods involving evolutionary algorithm and their applications in three common problems in bioinformatics, that is, feature selection, parameter estimation, and reconstruction of biological networks. The evolutionary algorithm was selected here due to its capability of finding global optimal solutions and its robustness. The hybrid intelligent approaches that combine evolutionary algorithm together with other standard intelligent approaches have been proved extremely useful in the above three topics. We hope this review can help the researchers from both bioinformatics and informatics to understand each other and boost their future collaborations. We believe that, with more effective hybrid intelligent methods introduced in the future, it will become relatively easier to analyze the ever-growing complex biological data.
Acknowledgments
This work was partly supported by the National Natural Science Foundation of China (91130032 and 61103075), the Innovation Program of Shanghai Municipal Education Commission (13ZZ072), and Shanghai Pujiang Program (13PJD032).
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
References
- 1.Barrett T, Edgar R. Gene expression omnibus: microarray data storage, submission, retrieval, and analysis. Methods in Enzymology. 2006;411:352–369. doi: 10.1016/S0076-6879(06)11019-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Abecasis GR, Auton A, Brooks LD, et al. An integrated map of genetic variation from 1, 092 human genomes. Nature. 2012;491(7422):56–65. doi: 10.1038/nature11632. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Golub TR, Slonim DK, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286(5439):531–527. doi: 10.1126/science.286.5439.531. [DOI] [PubMed] [Google Scholar]
- 4.Ramaswamy S, Tamayo P, Rifkin R, et al. Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences of the United States of America. 2001;98(26):15149–15154. doi: 10.1073/pnas.211566398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Liu KQ, Liu ZP, Hao JK, et al. Identifying dysregulated pathways in cancers from pathway interaction networks. BMC Bioinformatics. 2012;13, article 126 doi: 10.1186/1471-2105-13-126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Lanckriet GR, Deng M, Cristianini N, Jordan MI, Noble WS. Kernel-based data fusion and its application to protein function prediction in yeast. Pacific Symposium on Biocomputing. 2004:300–311. doi: 10.1142/9789812704856_0029. [DOI] [PubMed] [Google Scholar]
- 7.Barh D, Gupta K, Jain N, et al. Conserved host-pathogen PPIs. Globally conserved inter-species bacterial PPIs based conserved host-pathogen interactome derived novel target in C. pseudotuberculosis, C. diphtheriae, M. tuberculosis, C. ulcerans, Y. pestis, and E. coli targeted by Piper betel compounds. Integrative Biology. 2013;5(3):495–509. doi: 10.1039/c2ib20206a. [DOI] [PubMed] [Google Scholar]
- 8.Campillos M, Kuhn M, Gavin A-C, Jensen LJ, Bork P. Drug target identification using side-effect similarity. Science. 2008;321(5886):263–266. doi: 10.1126/science.1158140. [DOI] [PubMed] [Google Scholar]
- 9.Zhao X-M, Wang R-S, Chen L, Aihara K. Uncovering signal transduction networks from high-throughput data by integer linear programming. Nucleic Acids Research. 2008;36(9, article e48) doi: 10.1093/nar/gkn145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Fraser AS. Simulation of genetic systems by automatic digital computers. I. Introduction. Australian Journal of Biological Sciences. 1957;10:484–491. [Google Scholar]
- 11.Holland JH. Adaptation in Natural and Artificial Systems: an Introductory analysis with Applications to Biology, Control, and Artificial Intelligence. MIT Press; 1992. [Google Scholar]
- 12.Barricell NA. Numerical testing of evolution theories. Journal of Statistical Computation and Simulation. 1972;1(2):97–127. [Google Scholar]
- 13.Rechenberg I. Evolutionsstrategie: Optimierung Technischer Systeme Nach Prinzipien Der Biologischen Evolution. Technical University of Berlin; 1971. [Google Scholar]
- 14.Fogel LJ, Owens AJ, Walsh MJ. Artificial Intelligence Through Simulated Evolution. John Wiley & Sons; 1966. [Google Scholar]
- 15.Storn R, Price K. Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization. 1997;11(4):341–359. [Google Scholar]
- 16.Li L, Weinberg CR, Darden TA, Pedersen LG. Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics. 2002;17(12):1131–1142. doi: 10.1093/bioinformatics/17.12.1131. [DOI] [PubMed] [Google Scholar]
- 17.Jirapech-Umpai T, Aitken S. Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes. BMC Bioinformatics. 2005;6, article 148 doi: 10.1186/1471-2105-6-148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Goldberg DE, Holland JH. Genetic algorithms and machine learning. Machine Learning. 1988;3(2):95–99. [Google Scholar]
- 19.Hong J-H, Cho S-B. Efficient huge-scale feature selection with speciated genetic algorithm. Pattern Recognition Letters. 2006;27(2):143–150. [Google Scholar]
- 20.Chang SW, Abdul-Kareem S, Merican AF, et al. Oral cancer prognosis based on clinicopathologic and genomic markers using a hybrid of feature selection and machine learning methods. BMC Bioinformatics. 2013;14, article 170 doi: 10.1186/1471-2105-14-170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Mahdevar G, Zahiri J, Sadeghi M, Nowzari-Dalini A, Ahrabian H. Tag SNP selection via a genetic algorithm. Journal of Biomedical Informatics. 2010;43(5):800–804. doi: 10.1016/j.jbi.2010.05.011. [DOI] [PubMed] [Google Scholar]
- 22.Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. American Journal of Human Genetics. 2004;74(1):106–120. doi: 10.1086/381000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Baggerly KA, Morris JS, Wang J, Gold D, Xiao L-C, Coombes KR. A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization-time of flight proteomics spectra from serum samples. Proteomics. 2003;3(9):1667–1672. doi: 10.1002/pmic.200300522. [DOI] [PubMed] [Google Scholar]
- 24.Petricoin EF, III, Ardekani AM, Hitt BA. Use of proteomic patterns in serum to identify ovarian cancer. The Lancet. 2002;359:572–577. doi: 10.1016/S0140-6736(02)07746-2. [DOI] [PubMed] [Google Scholar]
- 25.Li L, Tang H, Wu Z, et al. Data mining techniques for cancer detection using serum proteomic profiling. Artificial Intelligence in Medicine. 2004;32(2):71–83. doi: 10.1016/j.artmed.2004.03.006. [DOI] [PubMed] [Google Scholar]
- 26.Zhao X-M, Cheung Y-M, Huang D-S. A novel approach to extracting features from motif content and protein composition for protein sequence classification. Neural Networks. 2005;18(8):1019–1028. doi: 10.1016/j.neunet.2005.07.002. [DOI] [PubMed] [Google Scholar]
- 27.Gong B, Guo Z, Li J, et al. Application of a genetic algorithm -Support vector machine hybrid for prediction of clinical phenotypes based on genome-wide SNP profiles of sib pairs. Proceedings of the 2nd International Confernce on Fuzzy Systems and Knowledge Discovery (FSKD '05); August 2005; pp. 830–835. [Google Scholar]
- 28.Li L, Jiang W, Li X, et al. A robust hybrid between genetic algorithm and support vector machine for extracting an optimal feature gene subset. Genomics. 2005;85(1):16–23. doi: 10.1016/j.ygeno.2004.09.007. [DOI] [PubMed] [Google Scholar]
- 29.Huang W-L, Tung C-W, Huang H-L, Hwang S-F, Ho S-Y. ProLoc: prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features. BioSystems. 2007;90(2):573–581. doi: 10.1016/j.biosystems.2007.01.001. [DOI] [PubMed] [Google Scholar]
- 30.Fernández M, Miranda-Saavedra D. Genome-wide enhancer prediction from epigenetic signatures using genetic algorithm-optimized support vector machines. Nucleic Acids Research. 2012;40(10):p. e77. doi: 10.1093/nar/gks149. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Saeys Y, Degroeve S, Aeyels D, Rouzé P, Van de Peer Y. Feature selection for splice site prediction: a new method using EDA-based feature ranking. BMC Bioinformatics. 2004;5, article 64 doi: 10.1186/1471-2105-5-64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Nemati S, Basiri ME, Ghasem-Aghaee N, Aghdam MH. A novel ACO-GA hybrid algorithm for feature selection in protein function prediction. Expert Systems with Applications. 2009;36(10):12086–12094. [Google Scholar]
- 33.Kamath U, Compton J, Islamaj-Dogan R, De Jong KA, Shehu A. An evolutionary algorithm approach for feature generation from sequence data and its application to DNA splice site prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2012;9(5):1387–1398. doi: 10.1109/TCBB.2012.53. [DOI] [PubMed] [Google Scholar]
- 34.Zhan C, Yeung LF. Parameter estimation in systems biology models using spline approximation. BMC Systems Biology. 2011;5, article 14 doi: 10.1186/1752-0509-5-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Ashyraliyev M, Fomekong-Nanfack Y, Kaandorp JA, Blom JG. Systems biology: parameter estimation for biochemical models. FEBS Journal. 2009;276(4):886–902. doi: 10.1111/j.1742-4658.2008.06844.x. [DOI] [PubMed] [Google Scholar]
- 36.Banga JR, Balsa-Canto E. Parameter estimation and optimal experimental design. Essays in Biochemistry. 2008;45:195–209. doi: 10.1042/BSE0450195. [DOI] [PubMed] [Google Scholar]
- 37.Moles CG, Mendes P, Banga JR. Parameter estimation in biochemical pathways: a comparison of global optimization methods. Genome Research. 2003;13(11):2467–2474. doi: 10.1101/gr.1262503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Katsuragi T, Ono N, Yasumoto K, et al. SS-mPMG and SS-GA: tools for finding pathways and dynamic simulation of metabolic networks. Plant Cell Physiology. 2013;54(5):728–739. doi: 10.1093/pcp/pct052. [DOI] [PubMed] [Google Scholar]
- 39.Ueda T, Tominaga D, Araki N, et al. Estimate hidden dynamic profiles of siRNA effect on apoptosis. BMC Bioinformatics. 2013;14, article 97 doi: 10.1186/1471-2105-14-97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Abdullah A, Deris S, Anwar S, Arjunan SN. An evolutionary firefly algorithm for the estimation of nonlinear biological model parameters. PloS One. 2013;8(3) doi: 10.1371/journal.pone.0056310.e56310 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Savageau MA, Rosen R. Biochemical Systems Analysis: A Study of Function and Design in molecular Biology ( Addison-Wesley; 1976. [Google Scholar]
- 42.Tominaga D, Okamoto M. Design of canonical model describing complex nonlinear dynamics. Proceedings of the 7th International Conference on Computer Applications in Biotechnology; 1998; pp. 85–90. [Google Scholar]
- 43.Kikuchi S, Tominaga D, Arita M, Takahashi K, Tomita M. Dynamics modeling of genetic networks using genetic algorithm and S-system. Bioinformatics. 2003;19(5):643–650. doi: 10.1093/bioinformatics/btg027. [DOI] [PubMed] [Google Scholar]
- 44.Tsutsui S, Yamamura M, Higuchi T. Multi-parent recombination with simplex crossover in real coded genetic algorithms. Proceedings of the Genetic and Evolutionary Computation Conference; 1999; pp. 657–664. [Google Scholar]
- 45.Noman N, Iba H. Reverse engineering genetic networks using evolutionary computation. Genome Informatics. 2005;16(2):205–214. [PubMed] [Google Scholar]
- 46.Moscato P. On evolution, search, optimization, genetic algorithms and martial arts: towards memetic algorithms. Caltech Concurrent Computation Program. 1989;(826)
- 47.Chowdhury AR, Chetty M, Vinh NX. Incorporating time-delays in S-System model for reverse engineering genetic networks. BMC Bioinformatics. 2013;14, article 196 doi: 10.1186/1471-2105-14-196. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Repsilber D, Liljenström H, Andersson SGE. Reverse engineering of regulatory networks: simulation studies on a genetic algorithm approach for ranking hypotheses. BioSystems. 2002;66(1-2):31–41. doi: 10.1016/s0303-2647(02)00019-9. [DOI] [PubMed] [Google Scholar]
- 49.Mendoza MR, Bazzan ALC. Evolving random boolean networks with genetic algorithms for regulatory networks reconstruction. Proceedings of the 13th Annual Genetic and Evolutionary Computation Conference (GECCO '11); July 2011; pp. 291–298. [Google Scholar]
- 50.Esmaeili A, Jacob C. A multi-objective differential evolutionary approach toward more stable gene regulatory networks. BioSystems. 2009;98(3):127–136. doi: 10.1016/j.biosystems.2009.09.002. [DOI] [PubMed] [Google Scholar]
- 51.Terfve C, Cokelaer T, Henriques D, et al. CellNOptR: a flexible toolkit to train protein signaling networks to data using multiple logic formalisms. BMC Systems Biology. 2012;6, article 133 doi: 10.1186/1752-0509-6-133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Crespo I, Krishna A, Le Bechec A, del Sol A. Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states. Nucleic Acids Research. 2013;41(1, article e8) doi: 10.1093/nar/gks785. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Yu J, Smith VA, Wang PP, Hartemink AJ, Jarvis ED. Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics. 2004;20(18):3594–3603. doi: 10.1093/bioinformatics/bth448. [DOI] [PubMed] [Google Scholar]
- 54.Xing Z, Wu D. Modeling multiple time units delayed gene regulatory network using dynamic Bayesian network. Proceedings of the 6th IEEE International Conference on Data Mining—Workshops (ICDM '06); December 2006; pp. 190–195. [Google Scholar]
- 55.Li Y, Ngom A. The max-min high-order dynamic Bayesian network learning for identifying gene regulatory networks from time-series microarray data. Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB '13); 2013; pp. 83–90. [DOI] [PubMed] [Google Scholar]
- 56.Rozenberg G, Engelfriet E. Elementary net systems. Lectures on Petri Nets I: Basic Models. 1998;1497:12–121. [Google Scholar]
- 57.Nummela J, Juistrom BA. Evolving, petri nets to represent metabolic pathways. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '05); June 2005; pp. 2133–2139. [Google Scholar]
- 58.Koh G, Teong HFC, Clément M-V, Hsu D, Thiagarajan PS. A decompositional approach to parameter estimation in pathway modeling: a case study of the Akt and MAPK pathways and their crosstalk. Bioinformatics. 2006;22(14):e271–e280. doi: 10.1093/bioinformatics/btl264. [DOI] [PubMed] [Google Scholar]