Abstract
Motivation
The increasing availability of annotated genome sequences enables construction of genome-scale metabolic networks, which are useful tools for studying organisms of interest. However, due to incomplete genome annotations, draft metabolic models contain gaps that must be filled in a time-consuming process before they are usable. Optimization-based algorithms that fill these gaps have been developed, however, gap-filling algorithms show significant error rates and often introduce incorrect reactions.
Results
Here, we present a new gap-filling method that computes the costs of candidate gap-filling reactions from a universal reaction database (MetaCyc) based on taxonomic information. When gap-filling a metabolic model for an organism M (such as Escherichia coli), the cost for reaction R is based on the frequency with which R occurs in other organisms within the phylum of M (in this case, Proteobacteria). The assumption behind this method is that different taxonomic groups are biased toward using different metabolic reactions. Evaluation of the new gap-filler on randomly degraded variants of the EcoCyc metabolic model for E.coli showed an increase in the average F1-score to 99.0 (when using the variable weights by frequency method at the phylum level), compared to 91.0 using the previous MetaFlux gap-filler and 80.3 using a basic gap-filler. Evaluation on two other microbial metabolic models showed similar improvements.
Availability and implementation
The Pathway Tools software (including MetaFlux) is free for academic use and is available at http://pathwaytools.com. Additional code for reproducing the results presented here is available at www.ai.sri.com/pkarp/pubs/taxgap/supplementary.zip.
Supplementary information
Supplementary data are available at Bioinformatics online.
1 Introduction
In recent years, the number of annotated microbial genome sequences has rapidly increased and with that growth has come a corresponding rise in the number of genome-scale metabolic network models (GEMs). GEMs have found applications in many areas, including metabolic engineering studies, enzyme discovery, evolutionary studies and community modeling (McCloskey et al., 2013). Flux balance analysis (FBA) is one of the basic constraint-based modeling frameworks used to study GEMs (Orth et al., 2010). Applied to a GEM, FBA shows if the microorganism will grow given a defined set of nutrients, secretions, biomass metabolites and reactions. We say a GEM is predicted to grow under a particular condition if all biomass metabolites are produced from the given nutrients.
Various software packages enable automatically generating a draft model given an annotated genome sequence (DeJongh et al., 2007; Faria et al., 2018; Hamilton and Reed, 2014; Karp et al., 2016; Machado et al., 2018; Wang et al., 2018). This step typically relies on reference metabolic databases such as KEGG (Kanehisa et al., 2017), ModelSEED (Henry et al., 2010), BiGG (King et al., 2016) or MetaCyc (Caspi et al., 2018) to connect genes to their metabolic reactions. The draft model then undergoes multiple steps of refinements to generate a final, high-quality model (Thiele and Palsson, 2010). One of the most significant bottlenecks in developing a metabolic model is the time required to add those reactions to the network that were initially omitted due to genome-annotation incompleteness. Such ‘network gaps’ prevent producing one or more biomass metabolites from the nutrients. Several gap-filling algorithms (Orth and Palsson, 2010; Pan and Reed, 2018) have been developed that address this problem by adding a minimal number of reactions to a model to produce as many biomass metabolites as possible (Henry et al., 2010; Latendresse and Karp, 2018; Latendresse et al., 2012; Reed et al., 2006). Other variations of the gap-filling problem include (i) enabling production and consumption of all metabolites, not just biomass metabolites (Kumar et al., 2007; Liu et al., 2017); (ii) adding or removing reactions to minimize the total difference between measured and predicted fluxes (Herrgård et al., 2006); and (iii) using additional information to prioritize candidate reactions, such as gene-expression information (Vitkin and Shlomi, 2012). These algorithms use a variety of optimization techniques and draw candidate reactions from different reference databases (KEGG, MetaCyc, etc.).
Our past studies (Karp et al., 2018a; Latendresse and Karp, 2018) have shown that reaction gap-filling suffers a significant error rate, meaning that gap-fillers often introduce incorrect reactions into the network. Errors can generally result from three possible causes: (i) The reaction database used by the gap-filler does not contain some of the required reactions (this was not the case for our experiments in Karp et al. (2018a); Latendresse and Karp (2018); or here). (ii) Time constraints or numerical errors prevent the solver used by the gap-filler from finding the correct solution (we did see this cause at a low rate in Karp et al. (2018a); Latendresse and Karp (2018); and here). (iii) The existence of alternative solutions precludes finding the correct result (a gap-filler can complete a network model in multiple ways by using the same reaction database due to the large number of reactions in the database; we believe this is the most prevalent cause of MetaFlux gap-filler accuracy issues).
In this work, we present an enhanced version of the MetaFlux mixed integer linear programming (MILP)-based gap-filler (Karp et al., 2016; Latendresse and Karp, 2018). This enhanced gap-filler uses a taxonomic gap-filling approach that assigns costs to candidate MetaCyc reactions based on taxonomic information. The assumption here is that different taxonomic groups are biased toward using different metabolic reactions. For example, when gap-filling the Escherichia coli metabolic network, the algorithm assigns a lower cost to reactions found in the BioCyc databases for organisms in the same phylum (Proteobacteria) as E.coli. We believe this approach identifies more biologically relevant solutions than basic gap-filling and guides the gap-filler when alternative solutions are typically present; indeed, our results support that hypothesis.
In the following sections, we present our evaluation of several gap-filler algorithms and parameter variations on randomly degraded metabolic models for E.coliK-12 MG1655, Lactobacillus rhamnosus GG and Bacteroides thetaiotaomicron VPI-5482, a common method of assessing gap-filler accuracy (Latendresse and Karp, 2018; Liu et al., 2017). We describe and discuss the results from our computational gap-filling experiments. And we end by presenting our conclusions.
2 Materials and methods
2.1 Computational experiments
All computational experiments were performed using MetaFlux version 23.0 and genome-scale metabolic models for E.coli K-12 MG1655 (Weaver et al., 2014), L.rhamnosus GG (unpublished) and B.thetaiotaomicronVPI-5482 (unpublished). The section below details each model. Starting from a FBA-solvable state (i.e. all biomass metabolites in the model can be produced given a defined set of nutrients, secretions and reactions or the model can produce biomass), we identified the subset of essential reactions from the resulting active reaction set (active reactions are those reactions that carry flux), excluding transport and any generic or instantiated reactions. (A reaction is considered essential if its removal from the network causes the model to not grow. A generic reaction is a reaction that has at least one compound class as a substrate and is instantiated before the model is solved. Instantiating a generic reaction results in one or more reactions with the compound class replaced with specific applicable compound instances.) Transport reactions were excluded, because they are usually curated within the organism PGDB and not included in MetaCyc, and thus will be unavailable as a candidate reaction if removed from the model. We also decided to exclude removal of generic or instantiated reactions, because we would be unable to collect accurate gap-filler performance metrics. At this point, we cannot choose to remove only one of the specific instantiated reactions of a generic reaction. Removing one generic reaction may result in the removal of multiple instantiated reactions. No such restrictions are made on the candidate reaction set (e.g. a candidate reaction from MetaCyc may be an instantiated reaction).
For each experiment, a set of reactions was randomly selected from the subset of essential active reactions, and these reactions were removed from their respective metabolic networks to create gapped models. We performed 100 simulations for three cases with different numbers of reactions removed, specifically 1, 5 or 10 removed reactions, for a total of 300 simulations (where each simulation is performed on a different set of reactions). The same sets of 1, 5 or 10 reactions were used for all experiments for the same model (i.e. for the sets with one reaction removed, the same 100 single reaction sets were used for all experiments with different weighting schemes). For each of the 300 simulations, only the sets of reactions removed were changed; all other simulation parameters (such as nutrients, secreted metabolites and biomass metabolites) were held constant. Biomass metabolite coefficients were ignored for gap-filling experiments (i.e. all metabolites have a coefficient of 1 or −1). All solutions were verified in FBA with their biomass metabolite coefficients restored.
Each gapped model was gap-filled using the MetaFlux GenDev Technique C gap-filler (Latendresse and Karp, 2018) (which is an MILP-based gap-filler and is the default mode for gap-filling starting with MetaFlux version 22.0) with metabolic reactions from the MetaCyc version 23.0 database, which consists of 15 904 reactions with the different weighting schemes tested in this work. A maximum of 30 min of elapsed time was allowed for each gap-fill run. A solution is found when all biomass metabolites can be produced with a total biomass flux of 0.001/h. MetaFlux version 23.0 uses the SCIP solver (version 6.0.0) (Gleixner et al.) for MILP-based gap-filling and for linear programming (LP)-based model solving.
2.2 Genome-scale metabolic models
2.2.1 Escherichia coli K-12 MG1655
We first used the genome-scale metabolic model of the released EcoCyc version 23.0 database (ECOLI), as it is the best-curated organism database, called a Pathway/Genome Database (PGDB), and the results of the model have been verified with experimental datasets (Karp et al., 2018b; Weaver et al., 2014). The EcoCyc PGDB is a Tier 1 (highly curated—at least one person-year of curation) database whose contents have been derived from 36 000 publications during the course of multiple person-decades of curation effort.
When solved using FBA aerobically on glucose, this model produced 85 biomass metabolites with a biomass flux of 1.006/h with 503 active reactions. The subset of essential active reactions contained 221 reactions, excluding transport and generic or instantiated reactions. ECOLI contains 2552 reactions, 2965 metabolites and 4500 genes. ECOLI is the representative model from the Proteobacteria phylum. We performed several computational experiments using the ECOLI model to evaluate different gap-filler weight parameters before performing additional experiments on models for two other organisms.
2.2.2 L.rhamnosus GG
The LactorhaCyc version 23.0 database (LRHA) is a Tier 2 database (moderately curated – <1 year) within BioCyc, for which a semi-completed genome-scale metabolic model has been developed. This model does not include any biomass metabolite coefficients and thus is not a fully refined model, and it generates only a partial list of biomass metabolites. The LRHA model was tested in aerobic conditions with glucose as the carbon source supplemented with 14 amino acids. When solved using FBA, this model produced 46 biomass metabolites with a biomass flux of 0.115/h with 221 active reactions. The subset of essential active reactions numbered 81, excluding transport and generic or instantiated reactions. LRHA contains 1071 reactions, 848 metabolites and 2775 genes. LRHA is the representative model from the Firmicutes phylum. With only 81 essential active reactions, only 81 simulations were run when evaluating the case with one reaction removed instead of the usual 100 simulations.
2.2.3 B.thetaiotaomicron VPI-5482
The BtheCyc version 23.1 database (BTHE) is another Tier 2 database within BioCyc for which a genome-scale metabolic model has been developed. Here, version 23.1 was used, as the PGDB was upgraded in this work after the public version 23.0 was released. The BTHE model was tested in anaerobic conditions with glucose and carbon dioxide. When solved using FBA, this model produced 34 biomass metabolites with a biomass flux of 0.0889/h with 192 active reactions. The subset of essential active reactions included 109 reactions, excluding transport and generic or instantiated reactions. BTHE contains 1451 reactions, 1039 metabolites and 4902 genes. BTHE is the representative model from the Bacteroidetes phylum. Both the BTHE and LRHA models have not been verified against experimental results, but they have both received some manual curation and will serve as additional verification of the proposed gap-filler weighting schemes in this work. The MetaFlux FBA files for all three models can be found in the Supplementary Files (see section on Software Availability).
2.3 Candidate reaction-weighting schemes
The MetaFlux gap-filler relies on MetaCyc as the universal reaction database from which to draw reactions to complete a given metabolic network model. MetaCyc is a curated database of experimentally elucidated metabolic reactions from all domains of life. Version 23.0 of MetaCyc contains 2722 pathways and 17 203 reactions. In this work, we explored different gap-filler variations that use different candidate reaction-weighting schemes to see if we could improve MetaFlux gap-filler accuracy. The objective function used for the gap-filler experiments is
where wb is the weight associated with each biomass metabolite b, which is set to 10 000; cr is the cost associated with each candidate reaction r (costs tested in this work ranged from −1 to −260); sb is the Boolean variable that controls the inclusion of the biomass metabolite; and ar is the Boolean variable that controls the inclusion of the candidate reaction.
In the following descriptions of gap-filling algorithms, when gap-filling the metabolic network for an organism M (e.g. E.coli), some of the algorithm variations we used consider information about a higher taxonomic group containing M, such as at the phylum or order levels (e.g. Proteobacteria). We call that higher taxonomic group the parent group. Reactions present in the parent group were assessed using the BioCyc database collection (Karp et al., 2019) by employing a method described below. The gap-filler variations we explored were as follows.
2.3.1 Basic gap-filler (BasicGap)
All candidate reactions are assigned the same cost; in this case, we set the cost of all MetaCyc reactions to −50. This method will serve as a comparison with SMILEY (Reed et al., 2006) and ModelSEED (Henry et al., 2010), which are gap-filler algorithms that attempt to add the smallest number of reactions from a universal reaction database such that the model is predicted to grow on the defined growth condition. Setting all candidate reactions to the same cost essentially simulates what SMILEY and ModelSEED do as all reactions have an equal chance of being considered (except that ModelSEED also allows relaxation of reaction reversibility).
2.3.2 MetaFlux gap-filler (MetaFluxC)
The default gap-filler distributed within MetaFlux version 23.0 uses a weak type of taxonomic information. It assigns a cost of −30 to all candidate reactions in the taxonomic range of M (the organism being gap-filled); −40 for candidate reactions of unknown taxonomic range; −50 for candidate reactions outside of the taxonomic range of the organism being gap-filled; and −1 for candidate reactions that are transport reactions or spontaneous reactions. The taxonomic range of a reaction in MetaCyc comes from taxonomic information assigned to pathways by a curator (e.g. if a pathway is curated as occurring in Proteobacteria, then all reactions in that pathway are inferred to occur in Proteobacteria).
2.3.3 Fixed weights by taxonomic group in rank (TaxFixed)
Consider a candidate reaction R from MetaCyc. If at least one BioCyc PGDB for an organism in the parent taxonomic group, P, has the MetaCyc candidate reaction R (meaning that Phylum[P,R] is non-zero when gap-filling organism M, for reaction R, for the phylum taxonomic reaction matrix, see Section 2.4 below), R will be assigned a cost of −30; else, R will be assigned a cost of −50. A cost of −1 is assigned to all candidate reactions that are transport reactions or spontaneous reactions. For example, if we were gap-filling the ECOLI model at the phylum level, any MetaCyc candidate reaction that exist in any Proteobacteria BioCyc PGDB will be assigned a cost of −30. Four taxonomic levels were explored: phylum, class, order and family. For ECOLI, these would correspond to Proteobacteria, Gammaproteobacteria, Enterobacterales and Enterobacteriaceae, respectively. Note that each taxonomic level is explored separately and is run as a separate experiment.
2.3.4 Variable weights by frequency in taxonomic group in rank (TaxVariable-Freq)
Candidate MetaCyc reactions R are assigned a different cost based on the frequency at which R exists in all BioCyc-B PGDBs for the parent taxonomic group P (see Section 2.4 below for definition of BioCyc-B). For example, if we were performing an experiment using the ECOLI model at the phylum level, if R exists in 50% of all Proteobacteria BioCyc-B PGDBs, it will be assigned a cost of −35 based on the following equations:
where base is a base cost of −30; freq is the frequency percentage rounded to the closest positive integer and is calculated as Phylum[Proteobacteria,R]/N where N is the number of Proteobacteria PGDBs in BioCyc-B (see Section 2.4 below); and range is a spread of the cost to be added onto the base and is set to 10. Specifically, we use range to scale the frequency percentage which goes from 0 to 100 to a range of 0 to 10 to differentiate reactions with different frequencies without overshadowing the base cost. Therefore, if a candidate reaction exists in more than 99.5% of Proteobacteria BioCyc-B PGDBs, it will be assigned a cost of −30. If the candidate reaction does not exist in any Proteobacteria BioCyc-B PGDB, it will be assigned a cost of −80. A cost of −1 is assigned to all candidate reactions that are transport reactions or spontaneous reactions.
2.3.5 Variable weights by normalized frequency in taxonomic group in rank (TaxVariable-NormFreq)
Candidate MetaCyc reactions R are assigned a different cost based on the average of the normalized frequency with which R exists in all BioCyc PGDBs at the taxonomic rank one level down. For example, if we were performing an experiment using the ECOLI model at the phylum level, we would calculate the average of the frequency percentages at which a reaction exists in all the taxa that are direct children of the Proteobacteria phylum. The frequency percentage of a reaction within taxon c1 is calculated as the ratio of BioCyc-B PGDBs within c1 that has the reaction over the total number of PGDBs in c1 rounded to the closest positive integer. The frequency percentage of the same reaction is then evaluated in the same way for all other taxa c under the phylum being tested. Then, the average of all frequency percentage values is calculated, and a final cost can be assigned to the reaction based on the following equations:
where base is the base cost of −30; normfreq is the average normalized frequency percentage rounded to the closest positive integer; and range is a spread of the cost to be added onto the base and is set to 10. Therefore, if a candidate reaction exists in more than 99.5% of Proteobacteria BioCyc PGDBs (i.e. if it exists in more than 99.5% of PGDBs for each taxon below Proteobacteria), it will be assigned a cost of −30. If R does not exist in any Proteobacteria BioCyc PGDB, it will be assigned a cost of −80. A cost of −1 is assigned to all candidate reactions that are transport reactions or spontaneous reactions.
We used R (R Core Team, 2019) and a Kruskal–Wallis rank-sum test to compare performance among the various weighting schemes. Tests used the 0.05 cutoff for significance and had one degree of freedom (df) unless otherwise specified.
2.4 Generation of taxonomic reaction matrices
All of the preceding taxonomic reaction-weighting schemes are computed from a set of large taxonomic reaction matrices (TRMs) derived from the BioCyc database collection. The matrices were generated using BioCyc version 23.0 except for B.thetaiotaomicron VPI-5482, which is at version 23.1 as it was updated in this work after version 23.0 was released. BioCyc version 23.0 contains 14 728 organism databases or PGDBs. About 14 206 of those PGDBs are in the bacteria superkingdom; we call that set of bacterial PGDBs BioCyc-B. BioCyc-B was used to populate the TRMs.
Each PGDB contains a large variety of datatypes, including genes, proteins, reactions, pathways and metabolites. However, only the PGDB reactions were used to populate the matrices. When generating the TRMs, we took advantage of a design property of BioCyc, which is that each unique reaction in BioCyc has a unique identifier (e.g. TRYPSYN-RXN), and the same unique identifier is assigned to a given biochemical reaction in every BioCyc PGDB. We can thus compute the union of all biochemical reaction objects found across all of BioCyc-B by taking the union of all reaction unique identifiers across BioCyc-B. That set of unique reactions of BioCyc-B contains 17 203 reactions.
We generated a TRM at each of the following taxonomic levels: phylum, class, order and family. All taxonomic information used in this work was derived from the January 8, 2019 version of the NCBI Taxonomy database (Sayers et al., 2009). The columns of each TRM are reactions from BioCyc-B; the rows of each TRM represent taxonomic groups. Each TRM is constructed in a similar fashion. Each of the four matrices contains 17 203 columns, one for each reaction in BioCyc-B. Consider the phylum matrix. Because the bacteria in BioCyc-B span 37 phyla, the phylum matrix contains 37 rows. The phylum matrix element Phylum[P,R] contains an integer that is the count of the number of PGDBs in BioCyc-B that are in phylum P and contain reaction R. When constructing a matrix to gap-fill a given organism (e.g. E.coli), we omit from the matrix data from the BioCyc-B PGDB for that organism so as not to bias the TRM toward an organism whose full metabolic network should be unknown at the time of gap-filling. The family matrix contains 372 rows because the bacteria in BioCyc-B span 372 taxonomic families. Element Family[P,R] of the family matrix contains a count of the number of PGDBs in BioCyc-B that are in family P and contain reaction R. For those of our computational experiments that relied on the TRMs (i.e. TaxVariable-Freq and TaxVariable-NormFreq), each experiment was executed at one taxonomic level, and hence accessed the matrix appropriate to that level.
Table 1 summarizes TRM statistics for the organisms used for evaluation in the study. One of the 37 rows in the phylum TRM is for the Proteobacteria, which is the phylum containing E.coli. That row integrates reaction data from 5856 BioCyc databases; 7577 of the columns in that row are non-zero, meaning that those reactions are present in one or more of those 5856 BioCyc databases. Similarly, one of the 372 rows in the family TRM is for the Enterobacteriaceae, which is the family containing E.coli. That row integrates data from 5398 reactions found in 1115 BioCyc organisms.
Table 1.
TRM statistics summary for the organisms used in this study
| Common name | PGDBs | Reactions | |
|---|---|---|---|
| Escherichia coli | |||
| Phylum | Proteobacteria | 5856 | 7577 |
| Class | Gammaproteobacteria | 3097 | 6873 |
| Order | Enterobacterales | 1330 | 5779 |
| Family | Enterobacteriaceae | 1115 | 5398 |
| Lactobacillus rhamnosus | |||
| Phylum | Firmicutes | 4013 | 5905 |
| Class | Bacilli | 2978 | 5506 |
| Order | Lactobacillales | 1527 | 4242 |
| Family | Lactobacillaceae | 367 | 3495 |
| Bacteroides thetaiotaomicron | |||
| Phylum | Bacteroidetes | 1003 | 4757 |
| Class | Bacteroidia | 366 | 3294 |
| Order | Bacteroidales | 366 | 3294 |
| Family | Bacteroidaceae | 121 | 2684 |
2.5 Gap-filler accuracy
Precision, recall and F1-score were calculated to evaluate the gap-filler’s accuracy using the different weighting schemes tested in this work. We call a gap-filler solution accurate if the gap-filler comes very close to suggesting adding to the model the very same reactions that were removed from the model. Precision is defined as the fraction of the reactions suggested by the gap-filler that are in the set of reactions removed; recall is defined as the fraction of the reactions removed that were recovered by the gap-filler; F1-score is the harmonic mean of precision and recall and is defined as the following:
F1-score is calculated using the average precision and average recall for all 300 simulations for a given weighting scheme (100 simulations for each of 3 cases with 1, 5 or 10 reactions removed). Each simulation offers four possible outcomes: (i) no solution is found within the set time limit on the solver (we did not see this case in any of our experiments); (ii) an optimal solution is found; (iii) a suboptimal solution is found that can produce all biomass metabolites but was interrupted due to the time limit on the solver; or (iv) no solution is found that can produce all biomass metabolites within the set time limit on the solver. Only the first three types of cases are considered when calculating precision and recall.
3 Results
3.1 Evaluation on E.coli K-12 MG1655
3.1.1 Basic gap-filler (BasicGap)
To develop a baseline measure of gap-filler accuracy, we first evaluated the candidate reaction-weighting scheme for a basic gap-filler. Our basic gap-filler assigns the same weight to every candidate reaction and does not use taxonomic information. Figure 1 shows the average F1-scores for all the gap-filler variations tested for ECOLI. The bar chart also shows the F1-score of each simulated case labeled within the bars (i.e. whether 1, 5 or 10 reactions were removed from the original model represented as RR1, RR5 or RR10, respectively) with the final average F1-score for a particular weighting scheme shown to the right of the bars. The individual precision and recall scores are not shown as they are very close (within a difference of one) for most cases and are well represented by the F1-score. The results for BasicGap (stripes) are shown at the bottom of Figure 1.
Fig. 1.
Average F1-scores for different gap-filler variations for ECOLI. Each row represents results for a different gap-filler variation as well as the different taxonomic levels. The numbers in each row represents the F1-score for each case (i.e. with 1, 5 or 10 reactions removed labeled as RR1, RR5 and RR10). These F1-scores are calculated from the average precision and recall from 100 simulations per case. The average F1-score is shown to the right of each bar
3.1.2 MetaFlux gap-filler (MetaFluxC)
Next, the default gap-filling mode in MetaFlux (GenDev Technique C) assigns a slightly different cost to candidate reactions depending on whether the organism being gap-filled is inside or outside the taxonomic range of the reaction, or that the taxonomic range of the reaction is unknown, based on pathway taxonomic range information in MetaCyc. The results for ECOLI shown in Figure 1 (dotted) clearly shows that the MetaFlux gap-filler performed significantly better than the basic gap-filler (an average F1-score of 91.0 versus 80.3; χ2 = 49.0, P < 0.001).
3.1.3 Fixed weights by taxonomic group in rank (TaxFixed)
The previous method used MetaCyc pathway taxonomic range information (which is limited in quantity and based on curator supposition) to estimate whether a reaction is typically used or not used by the taxonomic group of the organism being gap-filled, or whether no taxonomic range information is assigned for the reaction. The TaxFixed method uses a binary weighting scheme for reactions, but the applicability of a reaction to the organism is evaluated using the TRMs based on whether any BioCyc-B PGDB in the current taxonomic group (Proteobacteria, Enterobacteriaceae, etc.) contains the reaction.
The assumption behind this idea is that organisms within the same taxonomic groups may tend to use similar reactions and pathways when multiple alternatives exist. So, a candidate reaction is given a lower cost if at least one other BioCyc-B PGDB within the same taxonomic group of the organism to be gap-filled has that reaction. The average F1-scores across all cases for each rank are 84.6, 84.6, 84.3 and 86.1 for phylum, class, order and family, respectively (χ2 = 2.9, df = 3, NS) (see Fig. 1, grey). A slight improvement is seen at the family level, with no significant differences between the other three levels. However, the results for all rank levels are worse than for the MetaFluxC method (χ2 = 25.1. P < 0.001). By investigating individual simulation results, we found that alternative reactions suggested by the gap-filler have the same cost as the reactions removed; thus, the final optimal objective values are the same, and they are valid alternative solutions.
3.1.4 Variable weights by frequency in taxonomic group in rank (TaxVariable-Freq)
Then, we explored the idea of assigning variable costs to candidate reactions based on different taxonomic levels and reaction data from BioCyc-B PGDBs. We chose to use the frequency percentage a reaction exists in BioCyc-B PGDBs of a specified taxonomic group to assign different costs to different reactions. Using this weighting scheme, a reaction that occurs more frequently will be assigned a lower cost than another reaction that occurs less frequently among BioCyc-B PGDBs. We again evaluated this weighting scheme at the same four taxonomic levels. The average F1-scores across all cases for each rank are 99.0, 99.0, 99.4 and 99.4 for phylum, class, order and family, respectively (χ2 = 0.63, df = 3, NS) (see Fig. 1, black). While no significant differences between the results for the different taxonomic levels are seen, this method resulted in a significant improvement over all previous methods of assigning candidate reaction costs (χ2 = 87.8, P < 0.001 when compared with MetaFluxC). A discussion on the distribution of reaction costs for the TaxVariable-Freq method can be found in Supplementary Material, Table S1 and Figure S2.
3.1.5 Weight parameters sensitivity analysis
Since several parameters are involved in calculating the cost of a reaction using the variable weights by frequency method, we also performed a sensitivity analysis on the relevant parameters, specifically range, which is the spread of the cost to be added to a base cost for adding a reaction and an out-of-taxon cost for any candidate reaction that is outside of the current taxonomic group (i.e. reactions that do not exist in any BioCyc-B PGDBs for the taxonomic group tested). We tested several values above and below the current parameter values for range as well as different formulae for calculating the costs for out-of-taxon reactions for the simulations done at the phylum level and found that the results do not change significantly (see Table 2). Ultimately, we set the value for range to be 10 for simplicity and a fixed cost for out-of-taxon reactions of −80, and we use them for all relevant simulations in this work.
Table 2.
Weight parameters sensitivity analysis
| Run | Range | Base | OTC | OTC formula | Average F1-score |
|---|---|---|---|---|---|
| 1 | 5 | −30 | −70 | 2*(base − range) | 99.0 |
| 2 | 10 | −30 | −80 | 2*(base − range) | 99.4 |
| 3 | 50 | −30 | −160 | 2*(base − range) | 99.2 |
| 4 | 100 | −30 | −260 | 2*(base − range) | 99.3 |
| 5 | 10 | −30 | −70 | 2*base − range | 99.3 |
| 6 | 10 | −30 | −160 | 4*(base − range) | 99.3 |
OTC, out-of-taxon cost.
3.1.6 Variable weights by normalized frequency in taxonomic group in rank (TaxVariable-NormFreq)
Finally, we realized that the gap-filler results might be affected by the number of BioCyc PGDBs for particular organisms or taxonomic groups. For example, if we simply looked at the genus level, the PGDBs number 568 for Escherichia, 348 for Lactobacillus and 123 for Bacteroides. Since we are gap-filling the E.coli model, we may be seeing the significant improvement in gap-filler accuracy simply due to the high number of Escherichia PGDBs within the Proteobacteria phylum. To investigate this issue, we modified the way the frequency percentage (freq) of a reaction is calculated by normalizing the frequency based on the data for a taxonomic rank one level deeper. That is, when testing the phylum level, reaction frequencies were calculated for each class under that phylum normalized by the number of PGDBs in each class and finally averaged over all classes (see Section 2). The same equation as before was then used to calculate the final cost of adding that reaction. This method was applied to just the phylum level on the same dataset, and the results are shown in Figure 1 (white). This normalized frequency weighting scheme resulted in an average F1-score of 97.9; while slightly lower than the unnormalized scheme, it is still significantly better than the MetaFluxC weighting scheme. A discussion on the distribution of reaction costs for the TaxVariable-NormFreq method can be found Supplementary Material, Table S2 and Figure S3.
3.1.7 Simulations with 100 reactions removed for BasicGap and TaxVariable-Freq (phylum)
We also evaluated a larger number of reactions removed to reflect situations when the metabolic network is far from completion. Specifically, we evaluated BasicGap and TaxVariable-Freq (phylum) on ECOLI when 100 essential reactions were removed. Assuming the same solver time limit as all other computational experiments in this work of 30 min, the average F1-score across 100 simulations was 74.1 for BasicGap (precision = 76.2, recall = 72.2) and 92.3 for TaxVariable-Freq (phylum) (precision = 94.3, recall = 90.4). With 100 reactions removed, all simulations for both weighting schemes did not complete optimally within the given solver time limit, although all suboptimal solutions found could produce all biomass metabolites (simulations did not complete to optimality even when the solver time limit was set to 24 h for both methods). Although the overall accuracies are reduced compared to when 10 reactions were removed for both methods, the TaxVariable-Freq method still clearly produced a more accurate result than the BasicGap method.
3.2 Evaluation on models in other taxonomic groups
To extend the generality of our findings, we performed the same steps in evaluating the different candidate reaction-weighting schemes using genome-scale metabolic models from two different taxonomic groups: L.rhamnosusGG of the Firmicutes phylum and B.thetaiotaomicronVPI-5482 of the Bacteroidetes phylum.
3.2.1 L.rhamnosus GG
The gap-filler evaluation using the LRHA model is mostly similar to the results for the ECOLI model (see Fig. 2). Overall, the BasicGap method produced the worst result (average F1-score of 83.4), and the MetaFluxC method performed relatively well (average F1-score of 91.6). Interestingly, the results for the TaxFixed method are comparable to the MetaFluxC results with average F1-scores of 91.1, 91.2, 91.2 and 90.7 for phylum, class, order and family, respectively. However, note that for the LRHA model, the gap-filler did not complete optimally within the given solver time limit of 30 min for 7 out of 1124 simulations [i.e. (81 + 100 + 100)*4, note that there are only 81 simulations for the one reaction removed case because there are only 81 essential reactions in this model], although all suboptimal solutions found for the fixed weight method could produce all biomass metabolites. In fact, two simulations for the experiment using the basic gap-fill method did not complete within the time limit; in both cases, the gap-filler failed to find even suboptimal solutions that could produce all biomass metabolites. Finally, both the variable weights methods (TaxVariable-Freq and TaxVariable-NormFreq) gave the same result (surprisingly) and the best overall performance, with an average F1-score of 98.6. Once again, no significant differences in gap-filler accuracy based on different taxonomic ranks are seen.
Fig. 2.
Average F1-scores for different gap-filler variations for LRHA. Each row represents results for a different gap-filler variation as well as the different taxonomic levels. The numbers in each row represent the F1-score for each case (i.e. with 1, 5 or 10 reactions removed labeled as RR1, RR5 and RR10). These F1-scores are calculated from the average precision and recall from 100 simulations per case (except for RR1, which only has 81 simulations). The average F1-score is shown to the right of each bar
3.2.2 B.thetaiotaomicron VPI-5482
The general pattern holds when evaluating the different reaction-weighting schemes using the BTHE model (see Fig. 3). The BasicGap method for the BTHE model performed the worst out of all the models in this work, with an average F1-score of 65.3. The MetaFluxC method achieved an average F1-score of 82.7 with five simulation runs that did not complete within the solver time limit but with reaction sets that could complete the network to produce all biomass metabolites. Interestingly, the TaxFixed method for rank levels class and lower have higher accuracies than the MetaFluxC method. Nevertheless, the variable weights methods (TaxVariable-Freq and TaxVariable-NormFreq) still outperformed all other methods with accuracies in the mid-90s. Note that 1 out of the 1200 simulations performed using the TaxVariable-Freq method hit the solver time limit and was stopped, while 10 such simulations for TaxFixed were seen. Unsurprisingly, most of these time-limit cases occurred for the higher number of reactions removed (some RR5 but mostly RR10), as they contained more gaps to fill and more alternative paths to consider.
Fig. 3.
Average F1-scores for different gap-filler variations for BTHE. Each row represents results for a different gap-filler variation as well as the different taxonomic levels. The numbers in each row represent the F1-score for each case (i.e. with 1, 5 or 10 reactions removed labeled as RR1, RR5 and RR10). These F1-scores are calculated from the average precision and recall from 100 simulations per case. The average F1-score is shown to the right of each bar
4 Discussion
In this work, we performed computational experiments to evaluate different gap-filler variations using genome-scale models for E.coli (ECOLI), L.rhamnosus (LRHA) and B.thetaiotaomicron (BTHE) that we developed within the Pathway Tools environment. ECOLI is the only model that has been experimentally verified. LRHA and BTHE are incomplete models that could be expanded by including more biomass metabolites and their corresponding pathways, and verified against experimental data. However, such an expansion is out of the scope of this study, and we evaluated the gap-filler performance using these smaller models. We believe the results are still meaningful, because we observed similar shifts of improvements across all models. That is, the TaxVariable-Freq method shows a significant improvement in gap-filler accuracy over the current default MetaFluxC method across all three models. Although LRHA and BTHE have limited biomass metabolites, they include the most common biomass metabolites such as amino acids, ribonucleotides and deoxyribonucleotides (for BTHE) and additionally some lipids, cofactors, peptidoglycan and wall polysaccharides (for LRHA). Finally, the ECOLI model, which has the most biomass metabolites of all, turned out to achieve the highest accuracy (average F1-score of 99.0) when using the TaxVariable-Freq method. This represents a significant increase compared to the 80.3 found when using the BasicGap method, 91.0 using the MetaFluxC method or 84.6 using the TaxFixed method. We also evaluated the TaxVariable method with normalized frequencies (TaxVariable-NormFreq) and found that it has a slightly lower average F1-score of 97.9, but still offering a significant performance increase over the other methods (χ2 = 87.9, P < 0.001 when comparing TaxVariable-Freq to MetaFluxC). Overall, no significant differences between different rank levels was seen. Evaluation on the metabolic models for two other microbial strains showed similar improvements: L.rhamnosus improved from 91.6 (MetaFluxC) to 98.6 (TaxVariable-Freq) (χ2 = 62.4, P < 0.001) and B.thetaiotaomicron improved from 82.7 (MetaFluxC) to 96.0 (TaxVariable-Freq) (χ2 = 106.3, P < 0.001).
A Venn diagram of the reactions in the three phyla studied in this work, computed from the E.coli phylum TRM is shown in Supplementary Figure S1 (some numbers do not match Table 1, because Table 1 uses TRMs for different organisms). The numbers present the number of reactions within each phylum and those that overlap between two or more phyla. For example, there are 7577 reactions in all Proteobacteria BioCyc-B PGDBs, 1788 of which are not found in Firmicutes or Bacteroidetes. All three phyla share 4403 reactions; Proteobacteria and Bacteroidetes share 300 reactions not found in any Firmicutes BioCyc-B PGDBs. This figure demonstrates that different taxonomic groups can have significantly different reaction usage.
4.1 Why gap-filler solutions differed from reactions removed
Given that the reactions that were selected to be removed from the models to generate gapped models were all essential reactions, we expected the gap-filler to return either the original reaction that was removed or a different reaction that represented an alternative solution (had we removed reactions that did not carry flux, we would not have created a gap). We discuss three different types of cases observed where a different reaction was suggested using the TaxVariable-Freq method at the phylum level. Whenever a reaction identifier is shown in this section, its cost based on the TaxVariable-Freq method was calculated and shown in parenthesis.
4.1.1 Alternative reaction has the same cost
Although alternative reactions having the same cost is the primary weakness for all the weighting schemes that use a fixed weight, the problem also occurred for the TaxVariable-Freq method. While rare, if the alternative reaction appears at about the same frequency as the removed reaction, they could have the same rounded percentage value, which gives the same cost.
4.1.2 Cost of alternative reaction is lower than the removed reaction
This case can only happen with the variable weights methods (TaxVariable-Freq and TaxVariable-NormFreq), as candidate reaction weights are calculated based on frequency; therefore, the weight values are more variable. For the ECOLI model, when ASPDECARBOX-RXN (−38.9) was removed, the gap-filler suggested to add RXN-2901 (−36.3). These values correspond to ASPDECARBOX-RXN appearing only in 11% of the BioCyc PGDBs in the Proteobacteria phylum, whereas RXN-2901 appeared in 37% of the PGDBs. When simulating this case manually using MetaFlux, we can see that ASPDECARBOX-RXN is needed to produce succinyl-CoA and its removal caused a gap in the beta-alanine biosynthesis pathway. The gap-filler identified RXN-2901, which is a reversible beta-alanine aminotransferase reaction in the beta-alanine degradation pathway to be added. Although this reaction is used more frequently within the phylum, it is not in fact present in E.coli.
4.1.3 Two reactions removed can be replaced with one MetaCyc reaction
Two reactions removed (chosen randomly) as part of a set of five or ten reactions removed happened to be connected. For example, in the LRHA model, RXN-16910 (-30.5) and RXN-16909 (−30.5) are two connected reactions that convert hydrogen carbonate to carbamate through the shared carboxyphosphate compound. There happens to be a carbamate kinase reaction in MetaCyc that essentially does the two steps in one, RXN-19659 (−39.2). Although the two reactions occur more frequently (95%), it still costs less to add the one reaction that occurs less frequently (8%) than the two (−39.2 versus −61).
4.2 Gap-filler accuracy
We note that comparing the accuracy of the gap-filler variations evaluated here with other published algorithms is not straightforward, as the universal reaction database for candidate reactions is often different for each gap-filler algorithm (KEGG, ModelSEED, BiGG or MetaCyc). Further, the methods by which gap-filler performance is assessed can also vary. For example, the authors of the DEF (Liu et al., 2017) method also randomly removed reactions from published models and assessed their gap-filler based on recall. However, one difference between our methods is that we only removed essential reactions to guarantee the creation of a gapped model. If non-essential reactions were removed, we would not expect the gap-filler to recover those reactions and thus would expect a lower recall (and thus F1-score).
5 Conclusions
We have developed an enhanced gap-filling method that uses taxonomic information to assign costs to candidate gap-filling reactions obtained from a universal reaction database. The assumption is that different taxonomic groups are biased toward using different metabolic reactions. After evaluating different variations of the gap-filler on randomly degraded versions of the E.coli model in EcoCyc, we found that the TaxVariable-Freq (variable weights by frequency in taxonomic group) method outperformed all other methods tested (BasicGap, MetaFluxC, TaxFixed and TaxVariable-NormFreq). We did not find any significant differences between different rank levels when using the TaxFixed or TaxVariable methods. Finally, similar improvements were observed when using two other metabolic models from different taxonomic groups (LRHA and BTHE).
Software availability
The Pathway Tools software (including MetaFlux) is free for academic use, including source code; a fee applies for commercial use. See http://pathwaytools.com for download instructions and contact information. Additional code for reproducing the experimental results presented here is at www.ai.sri.com/pkarp/pubs/taxgap/supplementary.zip.
Funding
This work was supported by the National Institute of General Medical Sciences of the National Institutes of Health under award number GM075742. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Conflict of Interest: none declared.
Supplementary Material
References
- Caspi R. et al. (2018) The MetaCyc database of metabolic pathways and enzymes. Nucleic Acids Res., 46, D633–D639. [DOI] [PMC free article] [PubMed] [Google Scholar]
- DeJongh M. et al. (2007) Toward the automated generation of genome-scale metabolic networks in the SEED. BMC Bioinformatics, 8, 139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Faria J.P. et al. (2018) Methods for automated genome-scale metabolic model reconstruction. Biochem. Soc. Trans., 46, 931–936. [DOI] [PubMed] [Google Scholar]
- Gleixner A. et al. The SCIP Optimization Suite 6.0. 40. Available at: http://www.optimization-online.org/DB_FILE/2018/07/6692.pdf.
- Hamilton J.J., Reed J.L. (2014) Software platforms to facilitate reconstructing genome‐scale metabolic networks. Environ. Microbiol., 16, 49–59. [DOI] [PubMed] [Google Scholar]
- Henry C.S. et al. (2010) High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat. Biotechnol., 28, 977–982. [DOI] [PubMed] [Google Scholar]
- Herrgård M.J. et al. (2006) Identification of genome-scale metabolic network models using experimentally measured flux profiles. PLoS Comput. Biol., 2,e72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kanehisa M. et al. (2017) KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res., 45, D353–D361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karp P.D. et al. (2016) Pathway tools version 19.0 update: software for pathway/genome informatics and systems biology. Brief Bioinform., 17, 877–890. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karp P.D. et al. (2019) The BioCyc collection of microbial genomes and metabolic pathways. Brief Bioinform., 20, 1085–1093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karp P.D. et al. (2018a) How accurate is automated gap filling of metabolic models? BMC Syst. Biol., 12, 73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karp P.D. et al. (2018b) The EcoCyc database. EcoSal.Plus, 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- King Z.A. et al. (2016) BiGG models: a platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Res., 44, D515–D522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kumar V.S. et al. (2007) Optimization based automated curation of metabolic reconstructions. BMC Bioinformatics, 8, 212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Latendresse M. et al. (2012) Construction and completion of flux balance models from pathway databases. Bioinformatics, 28, 388–396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Latendresse M., Karp P. (2018) Evaluation of reaction gap-filling accuracy by randomization. BMC Bioinformatics, 19, 53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu L. et al. (2017) DEF: an automated dead-end filling approach based on quasi-endosymbiosis. Bioinformatics, 33, 405–413. [DOI] [PubMed] [Google Scholar]
- Machado D. et al. (2018) Fast automated reconstruction of genome-scale metabolic models for microbial species and communities. Nucleic Acids Res., 46, 7542–7553. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McCloskey D. et al. (2013) Basic and applied uses of genome-scale metabolic network reconstructions of Escherichia coli. Mol. Syst. Biol., 9, 661. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Orth J.D., Palsson B.O. (2010) Systematizing the generation of missing metabolic knowledge. Biotechnol. Bioeng., 107, 403–412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Orth J.D. et al. (2010) What is flux balance analysis? Nat. Biotechnol., 28, 245–248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pan S., Reed J.L. (2018) Advances in gap-filling genome-scale metabolic models and model-driven experiments lead to novel metabolic discoveries. Curr. Opin. Biotechnol., 51, 103–108. [DOI] [PubMed] [Google Scholar]
- R Core Team. (2019) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. [Google Scholar]
- Reed J.L. et al. (2006) Systems approach to refining genome annotation. Proc. Natl. Acad. Sci. USA, 103, 17480–17484. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sayers E.W. et al. (2009) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res., 37, D5–D15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thiele I., Palsson B.Ø. (2010) A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat. Protoc., 5, 93–121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vitkin E., Shlomi T. (2012) MIRAGE: a functional genomics-based approach for metabolic network model reconstruction and its application to cyanobacteria networks. Genome Biol., 13, R111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang H. et al. (2018) RAVEN 2.0: a versatile toolbox for metabolic network reconstruction and a case study on Streptomyces coelicolor. PLoS Comput. Biol., 14, e1006541. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weaver D.S. et al. (2014) A genome-scale metabolic flux model of Escherichia coli K-12 derived from the EcoCyc database. BMC Syst. Biol., 8, 79. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



