Abstract
Motivation: Genome-scale metabolic reconstructions summarize current knowledge about a target organism in a structured manner and as such highlight missing information. Such gaps can be filled algorithmically. Scalability limitations of available algorithms for gap filling hinder their application to compartmentalized reconstructions.
Results: We present fastGapFill, a computationally efficient tractable extension to the COBRA toolbox that permits the identification of candidate missing knowledge from a universal biochemical reaction database (e.g. Kyoto Encyclopedia of Genes and Genomes) for a given (compartmentalized) metabolic reconstruction. The stoichiometric consistency of the universal reaction database and of the metabolic reconstruction can be tested for permitting the computation of biologically more relevant solutions. We demonstrate the efficiency and scalability of fastGapFill on a range of metabolic reconstructions.
Availability and implementation: fastGapFill is freely available from http://thielelab.eu.
Contact: ines.thiele@uni.lu
Supplementary information: Supplementary data are available at Bioinformatics online.
1 INTRODUCTION
A biomolecular network reconstruction summarizes biochemical, physiological and genomic knowledge in a mathematically structured electronic format (Palsson, 2006). It can be converted into a computational model, and predictions have been used to accelerate biotechnological and biomedical discoveries (Oberhardt et al., 2010). The predictive capacity and accuracy of a model depend on the comprehensiveness and biochemical fidelity of the reconstruction, with respect to the underlying biochemistry. The comprehensiveness of a genome-scale metabolic reconstruction can be improved by using the model to detect and fill network gaps (Rolfsson et al., 2011). Similarly, reconstruction fidelity can be improved by using the model to detect reconstruction stoichiometry inconsistent with biochemistry (Gevorgyan et al., 2008) or reactions inconsistent with steady state flux (Vlassis et al., 2014).
Existing gap-filling algorithms, reviewed by Orth and Palsson (2010), become intractable in high dimensions. Decompartmentalization of genome-scale compartmentalized metabolic networks reduces their dimension, rendering gap filling tractable (Rolfsson et al., 2011). However, this approach underestimates the amount of missing information because it connects reactions that would normally not co-occur in the same cellular compartment.
We present fastGapFill, the first scalable algorithm capable of efficiently detecting and filling network gaps in compartmentalized genome-scale models. fastGapFill draws on, and extends, fastcore (Vlassis et al., 2014), an algorithm to approximate the cardinality function to identify a compact flux consist ent model, in which all reactions carry a non-zero flux in at least one flux distribution. fastGapFill allows integrating all three notions of model consistency, namely, gap-filling, flux consistency and stoichiometric consistency in a single tool.
2 METHODS
Formulation of the gap-filling problem. In the metabolic gap-filling problem (Reed et al., 2006), one starts with a computational metabolic model, M, that contains at least one blocked reaction, which, though desired, does not admit a non-zero steady state flux. From a universal database, e.g. the Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa and Goto, 2000), a search is made for at least one reaction that needs to be added to fill at least one gap in the model, such that at least one formerly blocked reaction can carry flux. Among other criteria, it may also be desirable to compute a compact flux consistent model, where the number of added universal reactions is minimal. A specific instance of this problem occurs in metabolic modeling, although our algorithm is applicable for any biochemical network model with gaps.
Computing a compact flux consistent model. We repurposed the recently developed fastcore algorithm (Vlassis et al., 2014) to compute a near-minimal set of reactions that need to be added to an input metabolic model M to render it flux consistent. fastcore takes inputs M and a core set of reactions . Then, it greedily expands C by computing a set of modes of M whose overall support contains the whole of C and a minimal set from . This is achieved by a series of L1-norm regularized linear programs that optimize a relaxed version of an (intractable) integer program under cardinality constrains (Vlassis et al., 2014). Our implementation efficiently identifies blocked reactions.
Preprocessing to generate a global model. A cellularly compartmentalized metabolic model (S) without blocked reactions (B), where , is expanded by a universal metabolic database U (e.g. KEGG), such that a copy of U is placed in each cellular compartment of S (including the extracellular space), to generate SU. For each metabolite occurring in a non-cytosolic compartment, a reversible intercompartmental transport reaction is added. For each extracellular metabolite, an exchange reaction is added. The sum of the latter two reaction sets (X) is added to SU to generate a global model, which is extended with solvable blocked reactions (), that is, reactions that were previously flux inconsistent but become flux consistent when added to the global model. In the extended global model (SUX), all reactions are flux consistent. Note that not all blocked reactions B may be solvable, and thus, they will not be present in SUX. All reactions of S and Bs represent the core set.
Computing a compact flux consistent subnetwork of a global model. fastGapFill computes a subnetwork of SUX, consisting of all core reactions, plus a minimal number of reactions from UX, such that all reactions in the resulting compact subnetwork are flux consistent. This is achieved by using a slightly modified version of fastcore, in which a vector of linear weightings prioritizes the addition of reactions within UX. For instance, one may prioritize the addition of metabolic reactions from U over transport reactions from X, or, by varying the weightings on non-core reactions, alternate compact sets of gap-filling reactions may be identified.
Optional analysis of gap-filling reactions. We provide the option to compute a flux vector that maximizes the flux through each blocked reaction in turn, while minimizing the Euclidean norm of flux through the subnetwork of SUX computed by one call to fastGapFill. Note that flux through more than one solvable blocked reaction may be necessary to fill a gap, and that the computed flux vector may not be of minimum cardinality.
Computing stoichiometric consistency. Many reaction databases contain stoichiometric inconsistencies (Gevorgyan et al., 2008), where the stoichiometry for at least two reactions is inconsistent with conservation of mass. For instance, the reactions and are stoichiometrically inconsistent, as no positive molecular mass can be assigned to A, B and C, such that the mass on both sides of both reactions is equal. fastGapFill allows to identify stoichiometrically inconsistent reactions from filling gaps, by using the scalable approach for approximate cardinality maximization used within fastcore, to compute a maximal set of metabolites in U that are involved in reactions that conserve mass.
3 IMPLEMENTATION
An open source, MATLAB (Mathworks, Inc.), implementation of fastGapFill is available as a cross-platform desktop computer extension to the openCOBRA toolbox (Schellenberger et al., 2011).
4 DISCUSSION
We applied fastGapFill to five metabolic models (Table 1), demonstrating its broad applicability and scalability for various sizes of the gap-filling problem. Alternate gap-filling solutions can be computed by changing weightings on non-core reactions in the preprocessed problem. Note that all candidate metabolic and transport reactions are hypotheses requiring experimental validation (Rolfsson et al., 2011). Our implementation provides an openCOBRA (Schellenberger et al., 2011) compatible version of the KEGG reaction database; however, any other universal reaction database could be used with fastGapFill, so long as the same input format is maintained and care is taken to correctly identify identical metabolites. fastGapFill is the first scalable approach to identify candidate missing knowledge in compartmentalized metabolic reconstructions, and the approach is applicable to any form of biochemical network gap-filling problem.
Table 1.
Gap filling of metabolic reconstructions on a standard desktop computer (Dell, Intel Core i5, 16 GB RAM, 64 bit)
| Model name | Thermotoga maritima | Escherichia coli | Synechocystis sp. | sIEC | Recon 2 |
|---|---|---|---|---|---|
| (Zhang et al., 2009) | (Feist et al., 2007) | (Nogales et al., 2012) | (Sahoo and Thiele, 2013) | (Thiele et al., 2013) | |
| Sa | 418 × 535 | 1501 × 2232 | 632 × 731 | 834 × 1260 | 3187 × 5837 |
| SUXa | 14 020 × 31 566 | 21 614 × 49 355 | 28 174 × 62 866 | 48 970 × 109 522 | 58 672 × 132 622 |
| Compb | 2 | 3 | 4 | 7 | 8 |
| B | 116 | 196 | 132 | 22 | 1603 |
| Bs | 84 | 159 | 100 | 17 | 490 |
| Number of gap-filling reactions | 87 | 138 | 172 | 14 | 400 |
| tpreprocessing (s)c | 52 | 237 | 344 | 1003 | 5552 |
| tfastGapFill (s) | 21 | 238 | 435 | 194 | 1826 |
aThe dimensions are given as metabolites × reactions.
bComp, compartments.
cPreprocessing includes computing the flux consistent metabolic model, merging of UX for all compartments of S and adding solvable blocked reactions Bs.
Note: Equal weighting of all reactions was used. See Supplementary Table S1 for candidate gap-filling solutions.
Funding: I.T. was supported by an ATTRACT program grant (FNR/A12/01) from the Luxembourg National Research Fund (FNR). R.F. was supported by the Interagency Modeling and Analysis Group, Multi-scale Modeling Consortium U01 awards from the National Institute of General Medical Sciences, award GM102098-01, and U.S. Department of Energy, Office of Science, Biological and Environmental Research Program, award ER65524.
Conflict of Interest: none declared.
Supplementary Material
REFERENCES
- Feist AM, et al. A genome–scale metabolic reconstruction for Escherichia coli K–12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Mol. Syst. Biol. 2007;3:121. doi: 10.1038/msb4100155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gevorgyan A, et al. Detection of stoichiometric inconsistencies in biomolecular models. Bioinformatics. 2008;24:2245–2251. doi: 10.1093/bioinformatics/btn425. [DOI] [PubMed] [Google Scholar]
- Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nogales J, et al. Detailing the optimality of photosynthesis in cyanobacteria through systems biology analysis. Proc. Natl Acad. Sci. USA. 2012;109:2678–2683. doi: 10.1073/pnas.1117907109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oberhardt MA, et al. Metabolic network analysis of Pseudomonas aeruginosa during chronic cystic fibrosis lung infection. J. Bacteriol. 2010;192:5534–5548. doi: 10.1128/JB.00900-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Orth J, Palsson BØ. Systematizing the generation of missing metabolic knowledge. Biotechnol. Bioeng. 2010;107:403–412. doi: 10.1002/bit.22844. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Palsson BØ. Systems Biology: Properties of Reconstructed Networks. New York: Cambridge Univ Press; 2006. [Google Scholar]
- Reed JL, et al. Systems approach to refining genome annotation. Proc. Natl Acad. Sci. USA. 2006;103:17480–17484. doi: 10.1073/pnas.0603364103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rolfsson O, et al. The human metabolic reconstruction Recon 1 directs hypotheses of novel human metabolic functions. BMC Syst. Biol. 2011;5:155. doi: 10.1186/1752-0509-5-155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sahoo S, Thiele I. Predicting the impact of diet and enzymopathies on human small intestinal epithelial cells. Hum. Mol. Genet. 2013;22:2705–2722. doi: 10.1093/hmg/ddt119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schellenberger J, et al. Quantitative prediction of cellular metabolism with constraint–based models: the COBRA Toolbox v2.0. Nat. Protoc. 2011;6:1290–1307. doi: 10.1038/nprot.2011.308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thiele I, et al. A community-driven global reconstruction of human metabolism. Nat. Biotechnol. 2013;31:419–425. doi: 10.1038/nbt.2488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vlassis N, et al. Fast reconstruction of compact context-specific metabolic network models. PLoS Comp. Biol. 2014;10:e1003424. doi: 10.1371/journal.pcbi.1003424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Y, et al. Three–dimensional structural view of the central metabolic network of Thermotoga maritima. Science. 2009;325:1544–1549. doi: 10.1126/science.1174671. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
