Summary
Genome-scale metabolic network reconstructions, assembled from annotated genomes, serve as a platform for integrating data from heterogeneous sources and generating hypotheses for further experimental validation. Implementing constraint-based modeling techniques such as Flux Balance Analysis (FBA) on network reconstructions allow for interrogating metabolism at a systems-level, which aids in identifying and rectifying gaps in knowledge. With genome sequences for various organisms from prokaryotes to eukaryotes becoming increasingly available, a significant bottleneck lies in the structural and functional annotation of these sequences. Using topologically-based and biologically-inspired metabolic network refinement, we can better characterize enzymatic functions present in an organism and link annotation of these functions to candidate transcripts, both steps that can be experimentally validated.
Keywords: metabolic network, gap filling, orphan reactions, flux balance analysis
1. Introduction
Of the 2000+ genomes that have been sequenced, around 40% of the protein products that have been identified have no described function [1]. Over 5000 enzymatic functions have been described across all species, but more than a third have no known corresponding genes or proteins [2,3]. Bridging these gaps of knowledge between gene and function is important to fully utilize data available in the post-genomic era. Here, we describe computational methods available in genome-scale metabolic modeling that aim to provide hypotheses that fill in these knowledge gaps as well as methods to experimentally validate these computational predictions.
1.1 Genome Annotation
There are two major forms of genome annotation: functional and structural annotation. Functional annotation involves an understanding of the biological functions inherent in the genome, while structural annotation includes identifying coding regions of DNA that encode for a protein product, known as open reading frames (ORFs). Several computational techniques are available to structurally annotate ORFs directly for a newly sequenced organism [4,5] (See Note 1), but functional annotation can be more challenging to assign as the sequence of an ORF alone does not necessarily describe its biological function [6]. In addition, only around 1% of protein sequences have experimentally-derived annotations [7]; thus, computational techniques are necessary for feasibly assigning functional annotations.
Protein structure and function is often well conserved between organisms, so we can infer function between homologous genes or proteins across organisms [8]. This is useful for identifying potential functions of uncharacterized ORFs (see Figure 1), such as those in a newly sequenced genome. However, this approach can be limited because there may be multiple homologous sequences when comparing a query sequence against full sequences of all other organisms (termed the metagenome), and in reality only one of the many homologous sequences may actually demonstrate the appropriate enzymatic activity. The error rate for function assigned by sequence similarity may be as high as 49% [9]; thus, a more guided approach is necessary for assigning function to ORFs.
Figure 1.

Description of functional annotation of metabolic enzymes and current deficiencies. (A) Functionally annotated Open Reading Frame (ORF) which represents a gene, a transcript, a protein, and its enzymatic function (represented by an EC number). (B) Structurally annotated ORF that has no characterized functional annotation. (C) Evidence exists for an enzymatic activity in an organism, but the orphan reaction has no assigned gene, transcript, or protein.
1.2 Genome-Scale Metabolic Network Reconstruction
Genome-scale metabolic reconstructions serve as a platform for integrating data from heterogeneous sources and generating hypotheses for further experimental validation. Metabolic networks are constructed from existing genome annotations and manually expanded from literature-based sources and biochemical information contained in publicly available databases [10]. Resulting models contain a comprehensive set of known biochemical reactions and their associated ORFs. Implementing a systems-level approach allows for the identification of potential gaps in knowledge based on discrepancies between model predictions and experimental data (e.g., gene essentiality screens) as well as topological features of the network (e.g., pathways resulting in dead-ends). With the assistance of semi-automated algorithms and manual inspection, we can fill in these knowledge gaps by modifying the network to include additional biochemical reactions that were previously missing, or by removing functions that were improperly added by previous annotators. By finding ORFs encoding for enzymes orthologous to those that catalyze the same functions in other organisms, we can improve both the structural and functional annotation of the genome for an organism of interest, while also creating a higher quality metabolic model.
In this chapter, we describe methods to:
Predict missing and mis-annotated biochemical reactions for a given organism using metabolic network reconstructions. These include biologically-inspired refinements, which bridge the gap between model predictions and experimental data, as well as topologically-based algorithms that find and fill blocked pathways in a given network. These methods help improve the functional annotation of the genome for the organism of interest as well as improve the predictive ability of the metabolic model.
Assign candidate ORFs to novel functions as well as to existing functions that lack ORFs (orphan reactions, see Figure 1). These relationships provide the link between functional and structural annotation, which are both important to a higher quality annotation and metabolic model.
Use a systems approach to decide on which network modifications to include (and further validate) when posed with multiple gap filling solutions.
Perform experiments to verify existence of candidate ORFs. This will help strengthen our confidence in both the structural and functional annotation of the genome in the organism of interest.
2. Resources
2.1 Genomic information, bioinformatics tools and biochemical databases
Genome sequence for an organism of interest
To improve the genome annotation of an organism using a metabolic network approach, we need a whole genome DNA sequence. This information is important to identifying ORFs that may catalyze newly added reactions.
Availability: NCBI GenBank [11] (http://www.ncbi.nlm.nih.gov/genbank/) Other bioinformatics resources outside the scope of metabolic modeling are available through the NCBI at: http://www.ncbi.nlm.nih.gov/guide/
Information on Enzyme Commission (E.C.) classifications
The E.C. classification system is used to define enzymatic activities that can occur within different organisms, and an E.C. number characterizes in part the functional annotation for an enzyme (and correspondingly for the catalyzed reaction(s)). E.C.’s are classified according to the following hierarchical scheme: EC-1 (oxidoreductases), EC-2 (transferases), EC-3 (hydrolases), EC-4 (lyases), EC-5 (isomerases) and EC-6 (ligases). There can be several sub-classes under these six categories. For example, the enzyme hexokinase, which is associated with an E.C. number of 2.7.1.1, belongs to the class on ‘transferases’ (enzymes that aid in the transfer of a functional moiety from one metabolite to another) and the subclass on ‘transferring phosphorous-containing groups’. Other enzymes in the same subclass include glucokinase (2.7.1.2) and galactokinase (2.7.1.6). The database ENZYME (part of ExPASy’s suite) contains information on E.C. numbers.
Availability: http://enzyme.expasy.org/
BLAST (Basic Local Alignment Search Tool)
BLAST computes the sequence similarity between sequences of amino acids or nucleic acids [12]. This bioinformatics tool allows for quantitative, high-throughput comparisons in sequences between organisms in order to identify homologous ORFs that may share functional annotations [13,8].
Availability: http://blast.ncbi.nlm.nih.gov
Biochemical databases
Reconstructing the metabolism of a particular organism involves integrating biochemical information from various publicly available databases and experimental literature sources. Below, we provide a list of some important publicly available biochemical databases that contain information on genome, enzymes, reactions and/or pathways. The list below is not intended to be comprehensive; rather, it provides a flavor for the kinds of publicly available resources that can be used in the genome-scale metabolic reconstruction and modeling process.
KEGG (Kyoto Encyclopedia of Genes and Genomes) database contains comprehensive data on known enzymatic reactions that occur across various organisms [14–16].
Availability: http://www.genome.jp/kegg
ExPASy (Expert Protein Analysis System) contains comprehensive information on EC numbers and protein structure [17,18].
Availability: http://expasy.org/
SEED allows for quickly generating automated draft metabolic networks for prokaryotic organisms of interest [19].
Availability: http://www.theseed.org
MetaCyc contains comprehensive information on pathways and enzymes across many organisms [20,21].
Availability: http://metacyc.org/
GeneDB is a pathogen genome database maintained by the Wellcome Trust Sanger Institute [22].
Availability: http://www.genedb.org/Homepage
MetRxn allows queries of a comprehensive metabolite/reaction database and comparisons of metabolites/reactions between KEGG, MetaCyc, several metabolic reconstructions, and more [23].
Availability: http://metrxn.che.psu.edu/
UniProt is a comprehensive knowledgebase of annotated protein sequences across many organisms [24].
Availability: http://www.uniprot.org/
Metabase: A wiki database of biological databases [25].
Availability: http://metadatabase.org
2.2 High-throughput experimental data
Substrate utilization assays: experimental observations of cellular growth under different substrate conditions.
Availability: Experimental literature.
Gene essentiality assays: experimental observations of cellular growth when a gene is knocked out or knocked down (e.g. the Keio collection of mutants for Escherichia coli [26,27])
Availability: DEG (Database of Essential Genes) http://tubic.tju.edu.cn/deg/ [28,29], OGEE (Online GEne Essentiality database) http://ogeedb.embl.de/ [30], and experimental literature.
2.3 Metabolic modeling setup
Metabolic model for an organism of interest
Methods in this chapter are geared towards improving the annotation of organisms for which a genome-scale reconstruction is available (See Note 2). A metabolic network consists of two major components: the stoichiometric matrix (S-matrix) and a set of rules for gene-protein-reaction (GPR) relationships. The S-matrix is comprised of biochemical reactions that occur in an organism while GPR relationships represent conditional statements in Boolean logic between ORFs and their enzymatic functions in the S-matrix.
Availability (typically in the SBML format [31]): BiGG database [32], MEMOSys [33], SEED [19], and literature.
Types of modifications that can be made to the metabolic network
In the process of curating a metabolic reconstruction, various types of network modifications can be made (see Figure 2):
Figure 2.

Toy network depicting two blocked reactions (BR) caused by root no-consumption (RNC) and root no-production (RNP) metabolites. Categories of reactions that may be added to the network (suggestions aimed at restoring flux through the RNC metabolite): (Category I) Added reversibility to an existing reaction, (Category II) Added intracellular reaction, (Category III) Added extracellular transport reaction, (Category IV, not shown) Added intracellular transport reaction. Note that in this case the Category II solution restores flux through both dead-end metabolites.
Category I: Adding reversibility to an existing reaction in the network.
Category II: Adding a new intracellular enzymatic or spontaneous reaction.
Category III: Adding an extracellular transport reaction where the metabolite can either be taken up or secreted by the cell. These reactions define nutrient conditions.
Category IV: Adding a new transport reaction within the cell. These reactions are often lumped with Category III reactions and are limited to compartmentalized metabolic models (See Note 3).
2.4 Metabolic Modeling Tools: COBRA software
The Constraints Based Reconstruction and Analysis (COBRA) Toolbox is a platform implemented in MATLAB (Mathworks, Natick, MA) for interrogating metabolic reconstructions. Many functions within COBRA require mathematical programming solvers. These functions often utilize data from KEGG as a reaction database.
Availability: http://opencobra.sourceforge.net/openCOBRA [34]
SBML Toolbox imports SBML formatted metabolic models [31] into the COBRA Toolbox.
Availability: SBML Toolbox (http://sbml.org/Software/SBMLToolbox) [35]
Flux Balance Analysis (FBA) identifies a flux distribution through the reaction network that produces an optimal flux through the objective function.
Availability: COBRA Toolbox 2.0 under optimizeCbModel()
Flux Variability Analysis (FVA) computes the ranges of possible fluxes for all reactions in a network while still maintaining a primary objective flux value such as optimal biomass production [36,37].
Availability: COBRA Toolbox 2.0 under fluxVariability()
GapFind finds all dead-end metabolites in a network including root no-production and root no-consumption metabolites (see Figure 2A) [38].
Availability: COBRA Toolbox 2.0 under gapFind()
DetectDeadEnds finds some dead-end metabolites, all of which participate in only one reaction as determined by the S-matrix.
Availability: COBRA Toolbox 2.0 under detectDeadEnds()
Flux Sampling samples feasible flux distributions without a user-defined objective function, identifying reactions that can only carry zero flux (blocked reactions).
Availability: COBRA Toolbox 2.0 under gpSampler()
SMILEY predicts a minimum set of enzymatic or transport reactions to add in order to sustain growth in one condition [39].
Availability: COBRA Toolbox 2.0 under growthExpMatch().
2.5 Metabolic Modeling Tools: Pathway Tools software
The Pathway Tools software environment integrates genome, pathway, and regulatory data for analysis and visualization [40,41]. These functions often utilize data from MetaCyc and UniProt as a reaction database.
Availability: http://biocyc.org/download.shtml
MetaFlux utilizes multiple gap-filling methods to aid in developing metabolic models and defining a feasible biomass reaction [42].
PHFiller (Pathway Hole Filler) finds genes for orphan reactions using BLAST and protein databases [43].
PHFiller-GC (Pathway Hole Filler – Genome Context) extends on the PHFiller algorithm to use a context-specific prediction of genes for orphan reactions based on shared pathways, shared operons between proteins, shared proteins in a complex, and regulatory interactions [44].
2.6 Metabolic Modeling Tools: Stand-alone algorithms
GapFill suggests adding reactions to restore flux through dead-end reactions [38].
Availability: http://maranas.che.psu.edu/software.htm
GrowMatch suggests adding or removing reactions to reconcile differences between model predictions and gene essentiality screens and nutrient utilization assays [45]. Requires a reaction database.
Availability: http://maranas.che.psu.edu/software.htm and modified version implemented in SEED [19]
OMNI (optimal metabolic network identification) compares metabolic flux analysis data and in silico predictions of flux distributions and suggests reactions to add/remove to better correlate predictions and experimental data [46].
Availability: See Supporting Information in Ref. [46]
BNICE suggests reactions that can consume or produce metabolites based on reaction rules from the EC classification system. [47]
Availability: See Methods in Ref. [47]
2.7 Experimental validation of candidate ORFs
In recent literature, experimental validation of candidate transcripts has been performed for the alga Chlamydomonas reinhardtii [48–50]. See Section 3.6 for a brief description of some key methods. Additionally, see Refs. [50,48,49] for a detailed description of the materials, reagents and protocols used to perform the experimental validations.
3. Methods
Here, we describe iterative steps to improving the genome annotation for an organism of interest using genome-scale metabolic network modeling. This is done using a systems approach to identify novel reactions that take place within an organism and functionally annotate ORFs that catalyze previously uncharacterized and newly identified enzymatic functions [51]. These methods are focused into four major steps, which are iteratively compatible with each other (see Figure 3 for overview):
-
Suggest modifications to the S-matrix by adding or removing biochemical reactions based on:
-
Manual inspection of literature evidence (Subsection 3.1)
Early-stage examination of central metabolism
Mid-stage multi-pathway refinement
Late-stage network validations
-
Semi-automated analysis of network topology (Subsection 3.2)
Gap filling based on dead-end metabolites
Gap filling based on blocked reactions
-
Semi-automated analysis of experimental data (Subsection 3.3)
Adding reactions based on prediction/experimental discrepancies
Removing reactions based on prediction/experimental discrepancies
-
Assign candidate transcripts/ORFs that catalyze candidate reactions and existing orphan reactions (Subsection 3.4)
Manually choose network modifications at the systems-level (Subsection 3.5)
Experimentally verify the presence and structure of candidate ORFs (Subsection 3.6).
Figure 3.

Workflow for improving the annotation of a genome sequence using bioinformatics techniques (light gray boxes) and metabolic modeling (white boxes). Dark gray box emphasizes the importance of existing knowledge databases for biochemical reactions in assigning function through sequence homology: Forward (Fwd.) BLAST compares the sequence of an uncharacterized ORF against the genomes of other organisms in order to infer function. Reverse (Rev.) BLAST compares sequences of proteins from other organisms that catalyze the same reaction as an orphan reaction identified that catalyzes the enzymatic activity of an orphan reaction by identifying proteins from other organisms that catalyze the same reaction as this orphan reaction.
Throughout and after the process of reconstructing a metabolic network, use these methods to identify and address knowledge gaps (see Table 1). Iterate through these semi-automated algorithms combined with manual inspection and experimental validation in order to generate a consistently high-quality metabolic model and contribute to the genome annotation of an organism.
Table 1.
Problem-driven methods to identify and reconcile knowledge gaps in metabolic models
| Type of Knowledge Gaps in Metabolic Network | Methods to Identify Knowledge Gaps | Methods to Reconcile Knowledge Gaps | Resulting Network Modifications |
|---|---|---|---|
| Dead-end metabolites: Metabolites cannot be consumed or produced in steady-state simulations | GapFind DetectDeadEnds | GapFill BNICE | Add new reactions: Category II Preferred |
| Blocked Reactions: Reactions cannot carry flux due to dead-end metabolites | FVA Flux Sampling | SMILEY | Add new reactions: Category II Preferred |
| Metabolic model is inconsistent with biological data (e.g. fluxomics data, gene essentiality or nutrient utilization assays) | FBA predicts no growth when growth is experimentally observed | SMILEY GrowMatch OMNI | Add new reactions: Category II Preferred |
| FBA predicts growth when growth is not experimentally observed | GrowMatch OMNI | Remove existing reactions: Category I or III Preferred | |
| Orphan Reactions: metabolic reactions that have no associated ORFs in the GPR | findOrphanRxns | Reverse BLAST PHFiller PHFiller-GC | Assign ORF to Orphan Reaction: (Pending experimental verification) |
3.1 Biologically-inspired metabolic network refinement
At all stages in the reconstruction process, it is important to evaluate the functionality of the network model so that any subsequent modifications consistently lead to a higher quality model. Any deficiencies in model functionality should be manually examined so as to identify enzymes/reactions to fill in these gaps in knowledge.
Early-stage examination of central metabolism: While drafting a reconstruction, examine central metabolic pathways (e.g. glycolysis, TCA cycle and pentose phosphate pathway) with literature support for completeness. Visualize these pathways in biochemical reaction databases (see Subsection 2.2) such as KEGG. For example, to evaluate functionality of glycolysis in a newly reconstructed metabolic network, use FBA to optimize for pyruvate production via the pyruvate kinase reaction under glucose-only nutrient conditions. If zero flux is obtained for the objective, manually check for gaps or deficiencies in the pathway.
Mid-stage multi-pathway refinement: At intermediate stages of model building, expand this process from individual pathways to include multiple pathways (such as amino acid and nucleotide metabolism). For example, use FBA to simulate the production of individual amino acids and nucleotides given a particular carbon source (e.g. glucose). It is important to utilize experimental literature evidence in this process (for example, not all organisms can synthesize all 20 amino acids de novo and may need to scavenge necessary amino acids from the environment).
Late-stage network validations: In the later stages of the reconstruction process, ensure basic functionality of the model in the context of a biologically-inspired objective function such as biomass or ATP production. A semi-automated algorithm for establishing a feasible biomass is MetaFlux. This algorithm suggests a maximum subset of biomass metabolites that can be produced given a minimum set of network modifications in the form of added/removed reactions. This algorithm accommodates an initial biomass reaction that may include some metabolites that are unable to be produced. Network changes can be manually inspected for feasibility using visualization software that is integrated with the Pathway Tools platform.
3.2 Using network topology to address dead-end metabolites and blocked reactions
At any stage in the reconstruction process, there may be blocked reactions, which are reactions that cannot carry flux in a metabolic model, usually as a result of containing, being upstream (root no-consumption), or being downstream (root no-production) of a dead-end metabolite (see Figure 2). Unblocking these reactions can be performed at the metabolite level, aimed at restoring fluxes that utilize the dead-end metabolite, or at the reaction level, aimed at restoring flux through the blocked reaction. These two general methods may provide different gap filling solutions, though they share the same root causes.
For network-topology based methods in the COBRA toolbox, set nutrient uptake (lower bound of exchange reactions) to −1 and all other reaction bounds to a large number such as 100000 (See Note 4).
At the metabolite level: restoring flux through dead-end metabolites
(optional) De-compartmentalize the model into only intracellular and extracellular compartments (i.e. remove sub-cellular compartments such as mitochondria, endoplasmic reticulum, nucleus, etc.). Dead-end metabolites may block additional reactions based on compartmentalization, and in one instance this was addressed by de-compartmentalizing the human metabolic model [52]. In this case, substantially fewer reactions were blocked making this approach more manageable, while forgoing the addition of Category IV reactions.
-
Identify dead-end metabolites using one or both of the following methods:
GapFind: Use the function gapFind() with a COBRA model, which will return all root no-production (and optionally all root no-consumption) dead-end metabolites. Although gaps shown in Figure 2 are fairly obvious, some non-obvious gaps are possible within the scope of a metabolic network [51].
DetectDeadEnds: Supply detectDeadEnds() with a COBRA model, which will return a list of all metabolites that participate in only one reaction (optionally excluding extracellular metabolites) [34]. This method does not return all possible dead-end gaps, but it detects metabolites that participate in only one reaction: including reversible reactions (Category I solution of a dead-end), which GapFind will not detect because these are not technically dead-ends. It may be preferred to identify Category II solutions to incorporate the metabolite into metabolic pathways.
-
Suggest reactions that include the dead-end metabolites as products for root no-production dead-ends and/or as substrates for root no-consumption cases. Semi-automated algorithms for suggesting reactions include:
GapFill: Supply GapFill with a reaction database, a list of all reactions from the network reconstruction (See Note 5), and root no-production/consumption metabolites. Only category I, II, and III suggestions will be returned.
BNICE: Supply BNICE with each pair of dead-end metabolites, which will suggest feasible biochemical reactions that link the two (See Note 6)
At the reaction level: restoring flux to blocked reactions
-
Identify blocked reactions using one or both of the following methods:
FVA: Using a COBRA model with any feasible objective function (carries positive flux using FBA) perform FVA() on all reactions (See Note 7) This may be useful for identifying blocked reactions in other contexts, but not necessarily for adding new reactions to the model.). Consider all reactions that have lower and upper flux ranges of approximately [0, 0] as blocked reactions. Perform FVA using both the on and off setting for allowing loops (set allowLoops to 1 or 0, respectively); allowing loops will hide reactions that cannot carry flux unless a potentially thermodynamically infeasible loop is carrying flux, which may or not be relevant to the biology of the model.
Flux sampling: Execute gpSampler() on a COBRA model to sample possible fluxes throughout all reactions. This algorithm does not require an objective function. Consider reactions without non-zero sampled flux values as blocked reactions (See Note 7).
-
Suggest additional reactions that restore flux through each blocked reaction. In this case, solutions can alleviate blocked reactions.
SMILEY: For each blocked reaction, set the objective function to the blocked reaction (See Note 8), and then execute growthExpMatch() with a relevant reaction database from KEGG (see Subsection 2.1). Running multiple iterations will suggest minimum subsets of additional reactions that restore flux to blocked reaction. This algorithm will suggest Category I, II, III, and if not de-compartmentalized, Category IV reactions. For an example of applying SMILEY to restore flux through blocked reactions, see Ref. [52].
Proceed to identifying candidate ORFs for these reactions (Subsection 3.4) and narrow down choices of network modifications (Subsection 3.5) to experimentally validate (Subsection 3.6).
3.3 Gene essentiality screens, nutrient utilization assays, and fluxomics data suggest experimentally-inspired model refinements
Implementing FBA on metabolic network reconstructions provides the ability to predict growth yields for organisms under different substrate conditions and genetic perturbations [53] (See Note 9). Implementing FBA on the iAF1260 metabolic reconstruction of E. coli yielded an accuracy of 91% for gene essentiality predictions as compared to experimental observations [54].
Adding (or removing) reactions to reconcile predictions with experimental data
Define a biologically relevant objective function for the metabolic model [55,56]. Consider a biomass function of nucleic acids, amino acids, lipids, and energy maintenance when comparing predictions to growth assays. Some experimental screens measure secretion of metabolites or other phenotypic properties, so adjust the objective of the metabolic model accordingly.
Define nutrient conditions relevant to biological setting. Consider carbon, nitrogen, phosphorus, and sulfur sources as well as presence of oxygen (See Ref. [57] for an example of establishing a minimal media relevant to biological conditions).
Manipulate the model to emulate any experimental perturbations. For example, remove relevant reactions from a COBRA model using removeRxns() or deleteModelGenes() for essentiality screens.
(optional) Ensure feasibility of the biomass function for at least one condition manually by FBA. Using algorithms in this section with an objective that requires a large number of additional reactions is usually neither computationally feasible nor biologically relevant.
-
Identify knowledge gaps by comparing experimental data to model data. Perform FBA for each condition and compare output to observed result (See Table 1). Possible discrepancies occur when:
Model predicts growth (or other relevant output) when no growth is experimentally observed; as a result, remove reactions from network
Model predicts no growth (or other relevant output) when growth is experimentally observed; as a result, add reactions to network
-
Suggest reactions to add (or remove) using one or more of the following semi-automated algorithms (See Note 10):
SMILEY: Execute growthExpMatch() with a relevant reaction database from KEGG (See Subsection 2.1) for each condition that needs rectification. Results include suggestions of only additional reactions. For an example of applying SMILEY to gene essentiality data in E. coli, see Ref. [57].
GrowMatch: GrowMatch suggests sets of reactions to add/remove that reconcile predictions for at least one growth condition. This algorithm identifies which network changes resolve model predictions and experimental observations for one growth condition while creating inconsistencies in other conditions.
OMNI: Supply OMNI with a library of reactions from a reaction database such as KEGG, a list of existing reactions that are allowed to be removed, and fluxomics data. Suggestions include both added and removed reactions that cause fluxomics data and metabolic predictions to match.
Proceed to identifying candidate ORFs for these reactions (Subsection 3.4) and narrow down choices of network modifications (Subsection 3.5) to experimentally validate (Subsection 3.6).
3.4 Predicting candidate ORFs for orphan reactions
Orphan Reactions are enzymatic reactions that have no assigned ORFs or proteins that catalyze this reaction (See Figure 1C). The disconnect between structural annotation and functional annotation occurs either in the scope of an individual organism (local orphan) or across all organisms (global orphan). For local orphan reactions, utilizing BLAST and BLAST-related algorithms are useful for identifying potential ORFs, but global orphan reactions have no known enzymes which we can compare sequence similarity. In a metabolic model, an orphan reaction has no relationship defined in the GPR, but the stoichiometry is defined in the S-matrix. This section details methods to annotate newly added orphan reactions from Subsections 3.1–3.3 as well as orphan reactions already included in the model.
In a COBRA model, add candidate reactions and perform findOrphanRxns() to locate all orphan reactions.
Identify whether each reaction is a local orphan reaction or a global orphan reaction. Query a reaction database such as KEGG or MetaCyc to see a list of annotated protein sequences across all organisms. Those that lack annotation are orphan reactions.
-
Use the following algorithms in order to identify potential ORFs in the genome of the organism of interest that encode for enzymes to catalyze each orphan reaction:
Reverse BLAST: Manually identify proteins in other organisms that share the same enzymatic function (E.C. number) from MetaCyc or similar database. Choose proteins from phylogenetically similar organisms first. Perform BLAST for each candidate protein against the whole genome of the modeled organism to identify similar sequences. See Ref. [8] for details on different BLAST techniques. Results are limited to local orphans as they rely on enzymatic functions linked with ORFs in other organisms.
PHFiller: Execute PHFiller to identify proteins that catalyze the orphan reaction of interest. PHFiller returns lists of candidate ORFs from the organism’s sequenced genome that may catalyze each orphan reaction. This algorithm semi-automates performing Reverse BLAST using existing protein databases, and results are limited to local orphans as they rely on enzymatic functions linked with ORFs in other organisms.
PHFiller-GC: Improvements over PHFiller in this algorithm consider not only BLAST sequence similarities, but also similarity based on other associations such as shared protein complexes, shared operons, regulatory and transcription factors. This allows the algorithm to explore beyond the structural level into pathways in identifying global orphan reactions.
3.5 Choosing reactions to experimentally validate
Before proceeding to experimental methods for verifying candidate ORFs and validating their function, efforts should be made to manually curate the metabolic model. All algorithms described in Subsections 3.1–3.3 are defined as semi-automated because manual inspection is essential to ensuring that only biologically-relevant and plausible reactions are added or removed from the metabolic network. Using all gap filling methods described in this chapter would yield an exorbitant number of suggested network modifications, so this section outlines general considerations for selecting reactions to subsequently validate.
Use a systems approach to add or remove reactions: consider simpler solutions that resolve the most knowledge gaps with the fewest network changes before addressing each problem on its own. Instead of adding all reactions from these methods to the model at once, select only a few solutions and iterate through the model and rerun gap-filling techniques. This iterative, systems approach to gap filling will ensure higher quality models and better genome annotations.
Considerations for adding reactions based on suggestions from semi-automated algorithms:
When multiple solutions are available for the various gap filling conditions, choose small subsets of reactions that resolve model predictions with the most experimental conditions.
Perform several iterations of each algorithm to produce a comprehensive set of possible reactions. When solutions are limited, try other databases where available.
Choose reactions that have candidate ORFs when possible over those that will remain local orphan reactions.
Examine all solutions for biological feasibility.
Prioritize adding Category II reactions when possible (See Table 1).
Category I and III reactions are simplified solutions (e.g. exporting a root no-consumption metabolite out of the cell to unblock a reaction). Add category I reactions only when literature supports the reversibility of an enzymatic reaction, or a separate distinct enzyme can catalyze the opposite reaction. Add category III reactions only when literature supports the concept that a metabolite is excreted and/or a transporter or relevant channel has been characterized.
Calculate feasible ranges of ΔG (free energy) for each reaction in the organism of interest and exclude thermodynamically infeasible reactions.
Add reactions cautiously as thermodynamically infeasible loops can be carried out within and between compartments (if applicable). For example, in a published reconstruction of human metabolism [58], a mitochondrial transporter allows symport of H+ ion and lactate into the mitochondria, while a separate spontaneous reaction allows lactate to diffuse freely from the mitochondria into the cytoplasm. In the model, this loop effectively serves as a free H+ transporter into the mitochondria, fueling infinite ATP production through ATP synthase, which uses high concentration gradient of protons between the cytosol and mitochondria to drive this core metabolic reaction.
Considerations for removing reactions based on suggestions from semi-automated algorithms:
Some metabolic models have confidence scores associated with annotated GPR relationships, such as those in the BiGG database [32]. Choose to remove reactions with low confidence (annotated based on inferred function made only by sequence homology) over those with high confidence (supported with biochemical and literature evidence).
Be cautious in removing reactions permanently. Enzymatic function may be possible at the genome-scale under the right conditions, but enzyme expression or function may be dependent on specific environmental, signaling, regulatory, or time-dependent factors that prevent the organism from adapting to a particular growth condition.
Not all reactions need to be unblocked. Evolutionary trends may have caused a loss of a key metabolic enzyme in a pathway, leaving the other members structurally and functionally intact at the molecular level, but rendered loss in functionally in the larger scope of the metabolic network. These are called biological gaps instead of knowledge gaps [51]. Thus, be cautious and consider adding reactions only when justified in the scope of biology (See Subsection 3.6).
Prioritize removing Category I and III reactions over Category II reactions unless literature evidence strongly supports lack of a transport or reversible reaction.
3.6 Experimentally verifying and sequencing candidate ORFs
The process of adding new reactions and adopting orphan reactions (and consequently incorporating new ORFs into the model) yields an opportunity to structurally annotate candidate transcripts through experimental validation. For each candidate transcript identified in this workflow, we can experimentally verify the presence and sequence of the ORF with the following methods. Note that the following methods were used in part in recent literature to perform model-driven experimental validation of candidate transcripts in the alga C. reinhardtii [48–50].
As a first step, the presence of candidate ORFs can be verified using RT-PCR (reverse transcription polymerase chain reaction). In this method, an RNA strand is reverse transcribed into complementary DNA (cDNA) with the aid of the enzyme reverse transcriptase. The resulting cDNA strands are used as templates in PCR reactions. RT-PCR is performed with forward and reverse primers corresponding to the putative ORFs of the enzymes of interest. This step can be used for amplification of ORFs. If only a partial sequence of an ORF is available or if the RT-PCR step failed due to errors in ORF termini, RACE (rapid amplification of cDNA ends) can be performed to define the transcript boundaries [59]. RACE is designed to identify the 5′ and 3′ ends of a transcript if the ORFs could either not be cloned or could be verified only at one end. In this method, cDNAs are generated by using PCR to amplify copies of sequence between a point within the transcript and the end (either the 3′ or 5′ end). The minimum information that is required is essentially a short stretch of sequence in the ORF to be cloned [60]. The ORF-specific primers are tailed with Gateway compatible sequences (see Gateway cloning technology from Invitrogen). The amplicons generated from RT-PCR are cloned using a Gateway donor vector and are transformed into E. coli. These transformed bacteria are used as a source of template in PCR reactions. The amplicons can be sequenced using different methods. For instance, the 5′ and 3′ ends can be verified by high-throughput Sanger sequencing, or ORF amplicons can be sequenced using the Roche 454FLX Titanium sequencing system. In the latter method, a given ORF could be considered experimentally validated if there was more than 98% coverage of the entire length of the reference sequence (i.e. predicted gene model) by the assembled contigs from the 454 reads. Successful cloning and matched sequence of a given ORF to its predicted gene model would experimentally validate the presence of the hypothesized transcript [50,48,49].
3.7 Outlook
Metabolic modeling can be used to improve genome annotation by filling in network gaps and linking biological functions to ORFs. Throughout this process we want to focus on using modeling as a semi-automated tool for generating hypotheses and using manual inspection, biological reasoning, and experimental validation for revising both functional and structural annotations. Iterate through these computational and experimental steps in order to create both higher quality models and annotations. The capabilities of these techniques will improve as annotations for other organisms become more accurate and complete, giving the opportunity to guide and accelerate annotation efforts in existing and new sequences.
Footnotes
Even once a genome has been completely sequenced, determining the structure of ORFs can be difficult with complex initiation, termination and splicing rules, and imperfect gene-calling algorithms [61]
Supplemental methods in [49,56] describe how to draft an initial reconstruction from existing annotations.
See Ref. [52] for an example of distinctions between Category I–III reactions. Category IV reactions are newly defined in this chapter as a de-compartmentalized model was used previously.
- forward-only reactions to [0, 10000]
- reverse-only reactions to [−10000,0]
- reversible reactions to [−10000, 10000]
- and exchange reactions to [−1, 10000] (nutrient uptake is typically a negative flux)
Include only forward reactions. For any reversible reactions, add a new forward reaction that consumes/produces the same metabolites in the opposite direction.
Suggestions may include novel reactions that may not be characterized in other organisms. Preferably, only add novel reactions when a high quality network has a few gaps that cannot be filled by other methods.
FVA calculates the full possible ranges of flux values for all reactions while maintaining a set percentage (default: 100%) of maximal flux through the objective. For identifying blocked reactions in the scope of the entire network, set the optPercentage input value to a lower value such as 10%. In this case, maintaining 100% flux through an objective such as biomass can limit flux through alternative pathways. If all flux from a rate-limiting carbon source is allocated to biomass, then alternative reactions in non-optimal pathways that utilize carbon will result in flux ranges of [0,0] flux, yielding additional blocked reactions.
Reversible reactions may need unblocking in only one direction. This occurs when a reversible reaction’s lower or upper bounds of possible flux is 0. For blocked reactions that are reversible, set the objective function to maximize the negative flux of this reaction because solutions to restoring flux through a blocked reaction in one direction will not necessarily restore flux in the opposite direction. For unblocking multiple reactions with one suggested set of reactions, set the objective to maximize flux through a set of reactions that are blocked.
Biomass production is not always the output in substrate utilization or gene essentiality screens. Thus, when using FBA and similar algorithms, take into account a different objective function that is in line with the measurements from the experimental screen such as metabolite secretion or ATP production.
Run multiple iterations of each algorithm as they rely on sampling of added or removed reactions.
References
- 1.Blaby-Haas CE, de Crecy-Lagard V. Mining high-throughput experimental data to link gene and function. Trends in biotechnology. 2011;29(4):174–182. doi: 10.1016/j.tibtech.2011.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Hanson AD, Pribat A, Waller JC, de Crecy-Lagard V. ‘Unknown’ proteins and ‘orphan’ enzymes: the missing half of the engineering parts list–and how to find it. The Biochemical journal. 2010;425(1):1–11. doi: 10.1042/BJ20091328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Pouliot Y, Karp PD. A survey of orphan enzyme activities. BMC bioinformatics. 2007;8:244. doi: 10.1186/1471-2105-8-244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Rombel IT, Sykes KF, Rayner S, Johnston SA. ORF-FINDER: a vector for high-throughput gene identification. Gene. 2002;282(1–2):33–41. doi: 10.1016/s0378-1119(01)00819-8. [DOI] [PubMed] [Google Scholar]
- 5.Lamesch P, Li N, Milstein S, Fan C, Hao T, Szabo G, Hu Z, Venkatesan K, Bethel G, Martin P, Rogers J, Lawlor S, McLaren S, Dricot A, Borick H, Cusick ME, Vandenhaute J, Dunham I, Hill DE, Vidal M. hORFeome v3.1: a resource of human open reading frames representing over 10,000 human genes. Genomics. 2007;89(3):307–315. doi: 10.1016/j.ygeno.2006.11.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Frishman D. Protein annotation at genomic scale: the current status. Chemical reviews. 2007;107(8):3448–3466. doi: 10.1021/cr068303k. [DOI] [PubMed] [Google Scholar]
- 7.Erdin S, Lisewski AM, Lichtarge O. Protein function prediction: towards integration of similarity metrics. Current opinion in structural biology. 2011;21(2):180–188. doi: 10.1016/j.sbi.2011.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Emes RD. Inferring function from homology. Methods in molecular biology. 2008;453:149–168. doi: 10.1007/978-1-60327-429-6_6. [DOI] [PubMed] [Google Scholar]
- 9.Jones CE, Brown AL, Baumann U. Estimating the annotation error rate of curated GO database sequence annotations. BMC bioinformatics. 2007;8:170. doi: 10.1186/1471-2105-8-170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Thiele I, Palsson BO. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nature protocols. 2010;5(1):93–121. doi: 10.1038/nprot.2009.203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank. Nucleic acids research. 2005;33(Database issue):D34–38. doi: 10.1093/nar/gki063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. Journal of molecular biology. 1990;215(3):403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 13.Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL. NCBI BLAST: a better web interface. Nucleic acids research. 2008;36(Web Server issue):W5–9. doi: 10.1093/nar/gkn201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research. 2000;28(1):27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M. The KEGG resource for deciphering the genome. Nucleic acids research. 2004;32(Database issue):D277–280. doi: 10.1093/nar/gkh063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, Yamanishi Y. KEGG for linking genomes to life and the environment. Nucleic acids research. 2008;36(Database issue):D480–484. doi: 10.1093/nar/gkm882. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Gasteiger E, Gattiker A, Hoogland C, Ivanyi I, Appel RD, Bairoch A. ExPASy: The proteomics server for in-depth protein knowledge and analysis. Nucleic acids research. 2003;31(13):3784–3788. doi: 10.1093/nar/gkg563. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Schneider M, Tognolli M, Bairoch A. The Swiss-Prot protein knowledgebase and ExPASy: providing the plant community with high quality proteomic data and tools. Plant physiology and biochemistry : PPB / Societe francaise de physiologie vegetale. 2004;42(12):1013–1021. doi: 10.1016/j.plaphy.2004.10.009. [DOI] [PubMed] [Google Scholar]
- 19.Henry CS, DeJongh M, Best AA, Frybarger PM, Linsay B, Stevens RL. High-throughput generation, optimization and analysis of genome-scale metabolic models. Nature biotechnology. 2010;28(9):977–982. doi: 10.1038/nbt.1672. [DOI] [PubMed] [Google Scholar]
- 20.Caspi R, Altman T, Dreher K, Fulcher CA, Subhraveti P, Keseler IM, Kothari A, Krummenacker M, Latendresse M, Mueller LA, Ong Q, Paley S, Pujar A, Shearer AG, Travers M, Weerasinghe D, Zhang P, Karp PD. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic acids research. 2012;40(Database issue):D742–753. doi: 10.1093/nar/gkr1014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Karp PD, Caspi R. A survey of metabolic databases emphasizing the MetaCyc family. Archives of toxicology. 2011;85(9):1015–1033. doi: 10.1007/s00204-011-0705-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Hertz-Fowler C, Peacock CS, Wood V, Aslett M, Kerhornou A, Mooney P, Tivey A, Berriman M, Hall N, Rutherford K, Parkhill J, Ivens AC, Rajandream MA, Barrell B. GeneDB: a resource for prokaryotic and eukaryotic organisms. Nucleic acids research. 2004;32(Database issue):D339–343. doi: 10.1093/nar/gkh007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kumar A, Suthers PF, Maranas CD. MetRxn: a knowledgebase of metabolites and reactions spanning metabolic models and databases. BMC bioinformatics. 2012;13(1):6. doi: 10.1186/1471-2105-13-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O’Donovan C, Redaschi N, Yeh LS. UniProt: the Universal Protein knowledgebase. Nucleic acids research. 2004;32(Database issue):D115–119. doi: 10.1093/nar/gkh131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Bolser DM, Chibon PY, Palopoli N, Gong S, Jacob D, Del Angel VD, Swan D, Bassi S, Gonzalez V, Suravajhala P, Hwang S, Romano P, Edwards R, Bishop B, Eargle J, Shtatland T, Provart NJ, Clements D, Renfro DP, Bhak D, Bhak J. MetaBase–the wiki-database of biological databases. Nucleic acids research. 2012;40(Database issue):D1250–1254. doi: 10.1093/nar/gkr1099. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M, Wanner BL, Mori H. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Molecular systems biology. 2006;2:2006 0008. doi: 10.1038/msb4100050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Yamamoto N, Nakahigashi K, Nakamichi T, Yoshino M, Takai Y, Touda Y, Furubayashi A, Kinjyo S, Dose H, Hasegawa M, Datsenko KA, Nakayashiki T, Tomita M, Wanner BL, Mori H. Update on the Keio collection of Escherichia coli single-gene deletion mutants. Molecular systems biology. 2009;5:335. doi: 10.1038/msb.2009.92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zhang R, Ou HY, Zhang CT. DEG: a database of essential genes. Nucleic acids research. 2004;32(Database issue):D271–272. doi: 10.1093/nar/gkh024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Zhang R, Lin Y. DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic acids research. 2009;37(Database issue):D455–458. doi: 10.1093/nar/gkn858. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Chen WH, Minguez P, Lercher MJ, Bork P. OGEE: an online gene essentiality database. Nucleic acids research. 2012;40(Database issue):D901–906. doi: 10.1093/nar/gkr986. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A, Cuellar AA, Dronov S, Gilles ED, Ginkel M, Gor V, Goryanin, Hedley WJ, Hodgman TC, Hofmeyr JH, Hunter PJ, Juty NS, Kasberger JL, Kremling A, Kummer U, Le Novere N, Loew LM, Lucio D, Mendes P, Minch E, Mjolsness ED, Nakayama Y, Nelson MR, Nielsen PF, Sakurada T, Schaff JC, Shapiro BE, Shimizu TS, Spence HD, Stelling J, Takahashi K, Tomita M, Wagner J, Wang J, Forum S. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics. 2003;19(4):524–531. doi: 10.1093/bioinformatics/btg015. [DOI] [PubMed] [Google Scholar]
- 32.Schellenberger J, Park JO, Conrad TM, Palsson BO. BiGG: a Biochemical Genetic and Genomic knowledgebase of large scale metabolic reconstructions. BMC bioinformatics. 2010;11:213. doi: 10.1186/1471-2105-11-213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Pabinger S, Rader R, Agren R, Nielsen J, Trajanoski Z. MEMOSys: Bioinformatics platform for genome-scale metabolic models. BMC systems biology. 2011;5:20. doi: 10.1186/1752-0509-5-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Schellenberger J, Que R, Fleming RM, Thiele I, Orth JD, Feist AM, Zielinski DC, Bordbar A, Lewis NE, Rahmanian S, Kang J, Hyduke DR, Palsson BO. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0. Nature protocols. 2011;6(9):1290–1307. doi: 10.1038/nprot.2011.308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Keating SM, Bornstein BJ, Finney A, Hucka M. SBMLToolbox: an SBML toolbox for MATLAB users. Bioinformatics. 2006;22(10):1275–1277. doi: 10.1093/bioinformatics/btl111. [DOI] [PubMed] [Google Scholar]
- 36.Mahadevan R, Schilling CH. The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. Metabolic engineering. 2003;5(4):264–276. doi: 10.1016/j.ymben.2003.09.002. [DOI] [PubMed] [Google Scholar]
- 37.Chavali AK, D’Auria KM, Hewlett EL, Pearson RD, Papin JA. A metabolic network approach for the identification and prioritization of antimicrobial drug targets. Trends in microbiology. 2012;20(3):113–123. doi: 10.1016/j.tim.2011.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Satish Kumar V, Dasika MS, Maranas CD. Optimization based automated curation of metabolic reconstructions. BMC bioinformatics. 2007;8:212. doi: 10.1186/1471-2105-8-212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Reed JL, Patel TR, Chen KH, Joyce AR, Applebee MK, Herring CD, Bui OT, Knight EM, Fong SS, Palsson BO. Systems approach to refining genome annotation. Proceedings of the National Academy of Sciences of the United States of America. 2006;103(46):17480–17484. doi: 10.1073/pnas.0603364103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Karp PD, Paley S, Romero P. The Pathway Tools software. Bioinformatics. 2002;18(Suppl 1):S225–232. doi: 10.1093/bioinformatics/18.suppl_1.s225. [DOI] [PubMed] [Google Scholar]
- 41.Karp PD, Paley SM, Krummenacker M, Latendresse M, Dale JM, Lee TJ, Kaipa P, Gilham F, Spaulding A, Popescu L, Altman T, Paulsen I, Keseler IM, Caspi R. Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology. Briefings in bioinformatics. 2010;11(1):40–79. doi: 10.1093/bib/bbp043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Latendresse M, Krummenacker M, Trupp M, Karp PD. Construction and completion of flux balance models from pathway databases. Bioinformatics. 2012;28(3):388–396. doi: 10.1093/bioinformatics/btr681. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Green ML, Karp PD. A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases. BMC bioinformatics. 2004;5:76. doi: 10.1186/1471-2105-5-76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Green ML, Karp PD. Using genome-context data to identify specific types of functional associations in pathway/genome databases. Bioinformatics. 2007;23(13):i205–211. doi: 10.1093/bioinformatics/btm213. [DOI] [PubMed] [Google Scholar]
- 45.Kumar VS, Maranas CD. GrowMatch: an automated method for reconciling in silico/in vivo growth predictions. PLoS computational biology. 2009;5(3):e1000308. doi: 10.1371/journal.pcbi.1000308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Herrgard MJ, Fong SS, Palsson BO. Identification of genome-scale metabolic network models using experimentally measured flux profiles. PLoS computational biology. 2006;2(7):e72. doi: 10.1371/journal.pcbi.0020072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Hatzimanikatis V, Li C, Ionita JA, Henry CS, Jankowski MD, Broadbelt LJ. Exploring the diversity of complex metabolic networks. Bioinformatics. 2005;21(8):1603–1609. doi: 10.1093/bioinformatics/bti213. [DOI] [PubMed] [Google Scholar]
- 48.Ghamsari L, Balaji S, Shen Y, Yang X, Balcha D, Fan C, Hao T, Yu H, Papin JA, Salehi-Ashtiani K. Genome-wide functional annotation and structural verification of metabolic ORFeome of Chlamydomonas reinhardtii. BMC genomics. 2011;12(Suppl 1):S4. doi: 10.1186/1471-2164-12-S1-S4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Manichaikul A, Ghamsari L, Hom EF, Lin C, Murray RR, Chang RL, Balaji S, Hao T, Shen Y, Chavali AK, Thiele I, Yang X, Fan C, Mello E, Hill DE, Vidal M, Salehi-Ashtiani K, Papin JA. Metabolic network analysis integrated with transcript verification for sequenced genomes. Nature methods. 2009;6(8):589–592. doi: 10.1038/nmeth.1348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Chang RL, Ghamsari L, Manichaikul A, Hom EF, Balaji S, Fu W, Shen Y, Hao T, Palsson BO, Salehi-Ashtiani K, Papin JA. Metabolic network reconstruction of Chlamydomonas offers insight into light-driven algal metabolism. Molecular systems biology. 2011;7:518. doi: 10.1038/msb.2011.52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Orth JD, Palsson BO. Systematizing the generation of missing metabolic knowledge. Biotechnology and bioengineering. 2010;107(3):403–412. doi: 10.1002/bit.22844. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Rolfsson O, Palsson BO, Thiele I. The human metabolic reconstruction Recon 1 directs hypotheses of novel human metabolic functions. BMC systems biology. 2011;5:155. doi: 10.1186/1752-0509-5-155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Oberhardt MA, Chavali AK, Papin JA. Flux balance analysis: interrogating genome-scale metabolic networks. Methods in molecular biology. 2009;500:61–80. doi: 10.1007/978-1-59745-525-1_3. [DOI] [PubMed] [Google Scholar]
- 54.Joyce AR, Reed JL, White A, Edwards R, Osterman A, Baba T, Mori H, Lesely SA, Palsson BO, Agarwalla S. Experimental and computational assessment of conditionally essential genes in Escherichia coli. Journal of bacteriology. 2006;188(23):8259–8271. doi: 10.1128/JB.00740-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Feist AM, Palsson BO. The biomass objective function. Current opinion in microbiology. 2010;13(3):344–349. doi: 10.1016/j.mib.2010.03.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Chavali AK, Whittemore JD, Eddy JA, Williams KT, Papin JA. Systems analysis of metabolism in the pathogenic trypanosomatid Leishmania major. Molecular systems biology. 2008;4:177. doi: 10.1038/msb.2008.15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Orth JD, Palsson BO. Gap-filling analysis of the iJO1366 Escherichia coli metabolic network reconstruction for discovery of metabolic functions. BMC systems biology. 2012;6(1):30. doi: 10.1186/1752-0509-6-30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Duarte NC, Becker SA, Jamshidi N, Thiele I, Mo ML, Vo TD, Srivas R, Palsson BO. Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proceedings of the National Academy of Sciences of the United States of America. 2007;104(6):1777–1782. doi: 10.1073/pnas.0610772104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Yeku O, Frohman MA. Rapid amplification of cDNA ends (RACE) Methods in molecular biology. 2011;703:107–122. doi: 10.1007/978-1-59745-248-9_8. [DOI] [PubMed] [Google Scholar]
- 60.Frohman MA, Dush MK, Martin GR. Rapid production of full-length cDNAs from rare transcripts: amplification using a single gene-specific oligonucleotide primer. Proceedings of the National Academy of Sciences of the United States of America. 1988;85(23):8998–9002. doi: 10.1073/pnas.85.23.8998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Jones SJ. Prediction of genomic functional elements. Annual review of genomics and human genetics. 2006;7:315–338. doi: 10.1146/annurev.genom.7.080505.115745. [DOI] [PubMed] [Google Scholar]
