Abstract
Flux balance analysis is a mathematical approach for analyzing the flow of metabolites through a metabolic network. This primer covers the theoretical basis of the approach, several practical examples and a software toolbox for performing the calculations.
Flux balance analysis (FBA) is a widely used approach for studying biochemical networks, in particular the genome-scale metabolic network reconstructions that have been built in the past decade1–5. These network reconstructions contain all of the known metabolic reactions in an organism and the genes that encode each enzyme. FBA calculates the flow of metabolites through this metabolic network, thereby making it possible to predict the growth rate of an organism or the rate of production of a biotechnologically important metabolite. With metabolic models for 35 organisms already available (http://systemsbiology.ucsd.edu/In_Silico_Organisms/Other_Organisms) and high-throughput technologies enabling the construction of many more each year6, 7, FBA is an important tool for harnessing the knowledge encoded in these models.
In this primer, we illustrate the principles behind FBA by applying it to the prediction of the specific growth rate of Escherichia coli in the presence and absence of oxygen. The principles outlined can be applied in many other contexts to analyze the phenotypes and capabilities of organisms with different environmental and genetic perturbations (a supplementary tutorial provides six additional worked examples with figures and computer code).
Flux balance analysis is based on constraints
The first step in FBA is to mathematically represent metabolic reactions (Box 1). The core feature of this representation is a tabulation, in the form of a numerical matrix, of the stoichiometric coefficients of each reaction (Fig. 1a,b). These stoichiometries impose constraints on the flow of metabolites through the network. Constraints such as these lie at the heart of FBA, differentiating the approach from theory-based models based on biophysical equations that require many difficult-to-measure kinetic parameters8, 9.
Box 1: Mathematical representation of metabolism.
Metabolic reactions are represented as a stoichiometric matrix (S), of size m*n. Every row of this matrix represents one unique compound (for a system with m compounds) and every column represents one reaction (n reactions). The entries in each column are the stoichiometric coefficients of the metabolites participating in a reaction. There is a negative coefficient for every metabolite consumed, and a positive coefficient for every metabolite that is produced. A stoichiometric coefficient of zero is used for every metabolite that does not participate in a particular reaction. S is a sparse matrix since most biochemical reactions involve only a few different metabolites. The flux through all of the reactions in a network is represented by the vector v, which has a length of n. The concentrations of all metabolites are represented by the vector x, with length m. The system of mass balance equations at steady state (dx/dt = 0) is given in Fig. 1c23:
Any v that satisfies this equation is said to be in the null space of S. In any realistic large-scale metabolic model, there are more reactions than there are compounds (n > m). In other words, there are more unknown variables than equations, so there is no unique solution to this system of equations.
Although constraints define a range of solutions, it is still possible to identify and analyze single points within the solution space. For example, we may be interested in identifying which point corresponds to the maximum growth rate or to maximum ATP production of an organism, given its particular set of constraints. FBA is one method for identifying such optimal points within a constrained space (Fig. 2).
FBA seeks to maximize or minimize an objective function Z = cTv, which can be any linear combination of fluxes, where c is a vector of weights, indicating how much each reaction (such as the biomass reaction when simulating maximum growth) contributes to the objective function. In practice, when only one reaction is desired for maximization or minimization, c is a vector of zeros with a one at the position of the reaction of interest (Fig. 1d).
Optimization of such a system is accomplished by linear programming (Fig. 1e). FBA can thus be defined as the use of linear programming to solve the equation Sv = 0 given a set of upper and lower bounds on v and a linear combination of fluxes as an objective function. The output of FBA is a particular flux distribution, v, which maximizes or minimizes the objective function.
Constraints are represented in two ways, as equations that balance reaction inputs and outputs and as inequalities that impose bounds on the system. The matrix of stoichiometries imposes flux (that is, mass) balance constraints on the system, ensuring that the total amount of any compound being produced must be equal to the total amount being consumed at steady state (Fig. 1c). Every reaction can also be given upper and lower bounds, which define the maximum and minimum allowable fluxes of the reactions. These balances and bounds define the space of allowable flux distributions of a system—that is, the rates at which every metabolite is consumed or produced by each reaction. Other constraints can also be added10.
From constraints to optimizing a phenotype
The next step in FBA is to define a biological objective that is relevant to the problem being studied (Fig. 1d). In the case of predicting growth, the objective is biomass production, the rate at which metabolic compounds are converted into biomass constituents such as nucleic acids, proteins, and lipids. Mathematically, the objective is represented by an ‘objective function’ that indicates how much each reaction contributes to the phenotype. A ‘biomass reaction’ that drains precursor metabolites from the system at their relative stoichiometries to simulate biomass production is selected by the objective function in order to predict growth rates. This reaction is scaled so that the flux through it is equal to the exponential growth rate (µ) of the organism.
Taken together, the mathematical representations of the metabolic reactions and of the phenotype define a system of linear equations. In flux balance analysis, these equations are solved using linear programming. Many computational linear programming algorithms exist, and they can very quickly identify optimal solutions to large systems of equations. The COBRA Toolbox11 is a freely available Matlab toolbox for performing these calculations (Box 2).
Box 2: Tools for flux balance analysis.
FBA computations, which fall into the category of constraint-based reconstruction and analysis (COBRA) methods, can be performed using several available tools24–26. The COBRA Toolbox11 is a freely available Matlab toolbox (http://systemsbiology.ucsd.edu/Downloads/Cobra_Toolbox) that can be used to perform a variety of COBRA methods, including many FBA-based methods. Models for the COBRA Toolbox are saved in the Systems Biology Markup Language (SBML)27 format and can be loaded with the function readCbModel. The E. coli core model28 used in this Primer is included in the toolbox.
In Matlab, the models are structures with fields, such as ‘rxns’ (a list of all reaction names), ‘mets’ (a list of all metabolite names) and ‘S’ (the stoichiometric matrix). The function ‘optimizeCbModel’ is used to perform FBA. To change the bounds on reactions, use the function ‘changeRxnBounds’. The Supplementary Tutorial contains examples of COBRA toolbox code for performing FBA.
In the growth example, suppose we are interested in the aerobic growth of E. coli under the assumption that uptake of glucose, and not oxygen, is the limiting constraint on growth. This involves capping the maximum rate of glucose uptake to a physiologically realistic level (e.g. 18.5 mmol glucose gDW−1 hr−1) and setting the maximum rate of oxygen uptake to an unrealistically high level, so that is does not constrain growth. Then, linear programming is used to determine the flux through the metabolic network that maximizes growth rate, resulting in a predicted exponential growth rate of 1.65 hr−1. (See Supplementary Tutorial for computer code).
Anaerobic growth of E. coli can be calculated by constraining the maximum rate of uptake of oxygen to zero and solving the system of equations, resulting in a predicted growth rate of 0.47 hr−1. Studies have shown that these predicted aerobic and anaerobic growth rates agree well with experimental measurements12.
Although growth is easy to experimentally measure, computational approaches such as flux balance analysis shine in simulations to predict metabolic reaction fluxes and simulations of growth on different substrates or with genetic manipulations. FBA does not require kinetic parameters and can be computed very quickly even for large networks, so it can be applied in studies that characterize many different perturbations. An example of such a case is given in Supplementary Example 6, which explores the effects on growth of deleting every pairwise combination of 136 E. coli genes to find double gene knockouts that are essential for survival of the bacteria.
FBA has limitations, however. Because it does not use kinetic parameters, it cannot predict metabolite concentrations. It is also only suitable for determining fluxes at steady state. Except in some modified forms, FBA does not account for regulatory effects such as activation of enzymes by protein kinases or regulation of gene expression, so its predictions may not always be accurate.
The many uses of flux balance analysis
Because the fundamentals of flux balance analysis are simple, the method has found diverse uses in physiological studies, gap-filling efforts and genome-scale synthetic biology3. By altering the bounds on certain reactions, growth on different media (Supplementary Example 1) or with multiple gene knockouts (Supplementary Example 6) can be simulated12. FBA can then be used to predict the yields of important cofactors such as ATP, NADH, or NADPH13 (Supplementary Example 2).
Whereas the example described here yielded a single optimal growth phenotype, in large metabolic networks, it is often possible for more than one solution to lead to the same desired optimal growth rate. For example, an organism may have two redundant pathways that both generate the same amount of ATP, so either pathway could be used when maximum ATP production is the desired phenotype. Such alternate optimal solutions can be identified through flux variability analysis, a method that uses FBA to maximize and minimize every reaction in a network14 (Supplementary Example 3), or by using a mixed-integer linear programming based algorithm15. More detailed phenotypic studies can be performed such as robustness analysis16, in which the effect on the objective function of varying a particular reaction flux can be analyzed (Supplementary Example 4). A more advanced form of robustness analysis involves varying two fluxes simultaneously to form a phenotypic phase plane17 (Supplementary Example 5).
All genome-scale metabolic reconstructions are incomplete, as they contain ‘knowledge gaps’ where reactions are missing. FBA is the basis for several algorithms that predict which reactions are missing by comparing in silico growth simulations to experimental results18, 19. Constraint-based models can also be used for metabolic engineering where FBA based algorithms, such as OptKnock20, can predict gene knockouts that allow an organism to produce desirable compounds21, 22.
This primer and the accompanying tutorials based on the COBRA toolbox (Box 2) should help those interested in harnessing the growing cadre of genome-scale metabolic reconstructions that are becoming available.
Supplementary Material
Contributor Information
Jeffrey D. Orth, University of California San Diego, La Jolla, CA, USA.
Ines Thiele, University of Iceland, Reykjavik, Iceland..
Bernhard Ø. Palsson, Email: palsson@ucsd.edu, University of California San Diego, La Jolla, CA, USA..
References
- 1.Duarte NC, et al. Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proceedings of the National Academy of Sciences of the United States of America. 2007;104:1777–1782. doi: 10.1073/pnas.0610772104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Feist AM, et al. A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Molecular systems biology. 2007;3 doi: 10.1038/msb4100155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Feist AM, Palsson BO. The growing scope of applications of genome-scale metabolic reconstructions using Escherichia coli. Nat Biotech. 2008;26:659–667. doi: 10.1038/nbt1401. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Reed JL, Vo TD, Schilling CH, Palsson BO. An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR) Genome biology. 2003;4:R54.51–R54.12. doi: 10.1186/gb-2003-4-9-r54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Oberhardt MA, Palsson BO, Papin JA. Applications of genome-scale metabolic reconstructions. Molecular systems biology. 2009;5:320. doi: 10.1038/msb.2009.77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Thiele I, Palsson BO. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat Protoc. 2010;5:93–121. doi: 10.1038/nprot.2009.203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Feist AM, Herrgard MJ, Thiele I, Reed JL, Palsson BO. Reconstruction of biochemical networks in microorganisms. Nat Rev Microbiol. 2009;7:129–143. doi: 10.1038/nrmicro1949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Covert MW, et al. Metabolic modeling of microbial strains in silico. Trends Biochem. Sci. 2001;26:179–186. doi: 10.1016/s0968-0004(00)01754-0. [DOI] [PubMed] [Google Scholar]
- 9.Edwards JS, Covert M, Palsson B. Metabolic modeling of microbes: the flux-balance approach. Environmental Microbiology. 2002;4:133–140. doi: 10.1046/j.1462-2920.2002.00282.x. [DOI] [PubMed] [Google Scholar]
- 10.Price ND, Reed JL, Palsson BO. Genome-scale models of microbial cells: evaluating the consequences of constraints. Nat Rev Microbiol. 2004;2:886–897. doi: 10.1038/nrmicro1023. [DOI] [PubMed] [Google Scholar]
- 11.Becker SA, et al. Quantitative prediction of cellular metabolism with constraint-based models: The COBRA Toolbox. Nat. Protocols. 2007;2:727–738. doi: 10.1038/nprot.2007.99. [DOI] [PubMed] [Google Scholar]
- 12.Edwards JS, Ibarra RU, Palsson BO. In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data. Nat Biotechnol. 2001;19:125–130. doi: 10.1038/84379. [DOI] [PubMed] [Google Scholar]
- 13.Varma A, Palsson BO. Metabolic capabilities of Escherichia coli: I. Synthesis of biosynthetic precursors and cofactors. Journal of Theoretical Biology. 1993;165:477–502. doi: 10.1006/jtbi.1993.1202. [DOI] [PubMed] [Google Scholar]
- 14.Mahadevan R, Schilling CH. The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. Metabolic engineering. 2003;5:264–276. doi: 10.1016/j.ymben.2003.09.002. [DOI] [PubMed] [Google Scholar]
- 15.Lee S, Phalakornkule C, Domach MM, Grossmann IE. Recursive MILP model for finding all the alternate optima in LP models for metabolic networks. Comp Chem Eng. 2000;24:711–716. [Google Scholar]
- 16.Edwards JS, Palsson BO. Robustness analysis of the Escherichia coli metabolic network. Biotechnology Progress. 2000;16:927–939. doi: 10.1021/bp0000712. [DOI] [PubMed] [Google Scholar]
- 17.Edwards JS, Ramakrishna R, Palsson BO. Characterizing the metabolic phenotype: a phenotype phase plane analysis. Biotechnology and bioengineering. 2002;77:27–36. doi: 10.1002/bit.10047. [DOI] [PubMed] [Google Scholar]
- 18.Reed JL, et al. Systems approach to refining genome annotation. Proceedings of the National Academy of Sciences of the United States of America. 2006;103:17480–17484. doi: 10.1073/pnas.0603364103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Kumar VS, Maranas CD. GrowMatch: an automated method for reconciling in silico/in vivo growth predictions. PLoS computational biology. 2009;5:e1000308. doi: 10.1371/journal.pcbi.1000308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Burgard AP, Pharkya P, Maranas CD. Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnology and bioengineering. 2003;84:647–657. doi: 10.1002/bit.10803. [DOI] [PubMed] [Google Scholar]
- 21.Feist AM, et al. Model-driven evaluation of the production potential for growth-coupled products of Escherichia coli. Metabolic engineering. 2009 doi: 10.1016/j.ymben.2009.10.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Park JM, Kim TY, Lee SY. Constraints-based genome-scale metabolic simulation for systems metabolic engineering. Biotechnology advances. 2009;27:979–988. doi: 10.1016/j.biotechadv.2009.05.019. [DOI] [PubMed] [Google Scholar]
- 23.Palsson BO. Systems biology: properties of reconstructed networks. New York: Cambridge University Press; 2006. [Google Scholar]
- 24.Jung TS, Yeo HC, Reddy SG, Cho WS, Lee DY. WEbcoli: an interactive and asynchronous web application for in silico design and analysis of genome-scale E.coli model. Bioinformatics (Oxford, England) 2009;25:2850–2852. doi: 10.1093/bioinformatics/btp496. [DOI] [PubMed] [Google Scholar]
- 25.Klamt S, Saez-Rodriguez J, Gilles ED. Structural and functional analysis of cellular networks with CellNetAnalyzer. BMC systems biology. 2007;1:2. doi: 10.1186/1752-0509-1-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Lee DY, Yun H, Park S, Lee SY. MetaFluxNet: the management of metabolic reaction information and quantitative metabolic flux analysis. Bioinformatics (Oxford, England) 2003;19:2144–2146. doi: 10.1093/bioinformatics/btg271. [DOI] [PubMed] [Google Scholar]
- 27.Hucka M, et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics (Oxford, England) 2003;19:524–531. doi: 10.1093/bioinformatics/btg015. [DOI] [PubMed] [Google Scholar]
- 28.Orth JD, Fleming RM, Palsson BO. In: EcoSal - Escherichia coli and Salmonella Cellular and Molecular Biology. Karp PD, editor. Washington D.C.: ASM Press; 2009. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.