Abstract
RAVEN is a commonly used MATLAB toolbox for genome-scale metabolic model (GEM) reconstruction, curation and constraint-based modelling and simulation. Here we present RAVEN Toolbox 2.0 with major enhancements, including: (i) de novo reconstruction of GEMs based on the MetaCyc pathway database; (ii) a redesigned KEGG-based reconstruction pipeline; (iii) convergence of reconstructions from various sources; (iv) improved performance, usability, and compatibility with the COBRA Toolbox. Capabilities of RAVEN 2.0 are here illustrated through de novo reconstruction of GEMs for the antibiotic-producing bacterium Streptomyces coelicolor. Comparison of the automated de novo reconstructions with the iMK1208 model, a previously published high-quality S. coelicolor GEM, exemplifies that RAVEN 2.0 can capture most of the manually curated model. The generated de novo reconstruction is subsequently used to curate iMK1208 resulting in Sco4, the most comprehensive GEM of S. coelicolor, with increased coverage of both primary and secondary metabolism. This increased coverage allows the use of Sco4 to predict novel genome editing targets for optimized secondary metabolites production. As such, we demonstrate that RAVEN 2.0 can be used not only for de novo GEM reconstruction, but also for curating existing models based on up-to-date databases. Both RAVEN 2.0 and Sco4 are distributed through GitHub to facilitate usage and further development by the community (https://github.com/SysBioChalmers/RAVEN and https://github.com/SysBioChalmers/Streptomyces_coelicolor-GEM).
Author summary
Cellular metabolism is a large and complex network. Hence, investigations of metabolic networks are aided by in silico modelling and simulations. Metabolic networks can be derived from whole-genome sequences, through identifying what enzymes are present and connecting these to formalized chemical reactions. To facilitate the reconstruction of genome-scale models of metabolism (GEMs), we have developed RAVEN 2.0. This versatile toolbox can reconstruct GEMs fast, through either metabolic pathway databases KEGG and MetaCyc, or from homology with an existing GEM. We demonstrate RAVEN's functionality through generation of a metabolic model of Streptomyces coelicolor, an antibiotic-producing bacterium. Comparison of this de novo generated GEM with a previously manually curated model demonstrates that RAVEN captures most of the previous model, and we subsequently reconstructed an updated model of S. coelicolor: Sco4. Following, we used Sco4 to predict promising targets for genetic engineering, which can be used to increase antibiotic production.
Introduction
Genome-scale metabolic models (GEMs) are comprehensive in silico representations of the complete set of metabolic reactions that take place in a cell [1]. GEMs can be used to understand and predict how organisms react to variations on genetic and environmental parameters [2]. Recent studies demonstrated the extensive applications of GEMs in discovering novel metabolic engineering strategies [3]; studying microbial communities [4]; finding biomarkers for human diseases and personalized and precision medicines [5,6]; and improving antibiotic production [7]. With the increasing ease of obtaining whole-genome sequences, significant challenges remain to translate this knowledge to high-quality GEMs [8].
To meet the increasing demand of metabolic network modelling, the original RAVEN (Reconstruction, Analysis and Visualization of Metabolic Networks) toolbox was developed to facilitate GEM reconstruction, curation, and simulation [9]. In addition to facilitating the analysis and visualization of existing GEMs, RAVEN particularly aimed to assist semi-automated draft model reconstruction, utilizing existing template GEMs and the KEGG database [10]. Since publication, RAVEN has been used in GEMs reconstruction for a wide variety of organisms, ranging from bacteria [11], archaea [12] to human gut microbiome [13], eukaryotic microalgae [14], parasites [15–17], and fungi [18], as well as various human tissues [19,20] and generic mammalian models with complex metabolism [21,22]. As such, the RAVEN toolbox has functioned as one of the two major MATLAB-based packages for constraint-based metabolic modelling, together with the COBRA Toolbox [23–25].
Here, we present RAVEN 2.0 with greatly enhanced reconstruction capabilities, together with additional new features (Fig 1, Table 1). A prominent enhancement of RAVEN 2.0 is the use of the MetaCyc database in assisting draft model reconstruction. MetaCyc is a pathway database that collects only experimentally verified pathways with curated reversibility information and mass-balanced reactions [26]. RAVEN 2.0 can leverage this high-quality database to enhance the GEM reconstruction process. While the functionality of the original RAVEN toolbox was illustrated by reconstructing a GEM of Penicillium chrysogenum [9], we here demonstrate the new and improved capabilities and wide applicability of RAVEN 2.0 through reconstruction of a GEM for Streptomyces coelicolor.
Table 1. Feature comparison of GEM reconstruction toolboxes.
Features | MEMO Sys | FAME | Microbes Flux | CoReCo | Pathway Tools | RAVEN 1.0 | COBRA 3.0a | Model SEED | merlin | RAVEN 2.0 |
---|---|---|---|---|---|---|---|---|---|---|
Reconstruct GEM for - Prokaryote |
✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | |
- Eukaryote | ✔ | ✔ | ✔ | ✔ | ✔ | |||||
- Tissues/cell type | ✔ | ✔ | ||||||||
Reconstruct GEM based on - KEGG |
✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||
- MetaCyc | ✔ | ✔ | ||||||||
- HMR | ✔ | |||||||||
- Template modelb | ✔ | ✔ | ✔ | |||||||
Import/Export - SBML |
✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
- YAML | e | ✔ | ||||||||
- Excel | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||||
Mass and charge balancec | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||||
Define sub-cellular localisationd | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||||
Annotate transporters during reconstruction | ✔ | ✔ | ✔ | ✔ | ||||||
Include spontaneous reactions during reconstruction | ✔ | ✔ | ||||||||
Flux balance analysis simulation | ✔ | ✔ | ✔ | ✔ | ✔ | |||||
Pathways visualisationf | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
a. While COBRA does not support GEM reconstruction, it is included here as widely used MATLAB toolbox for GEM analysis.
b. An existing GEM can be used as template for model reconstruction based on sequence homology.
c. The mass- and charge-balanced reactions are derived from the MetaCyc database.
d. The sub-cellular localization of reactions can be estimated using the predictLocalization function in RAVEN.
e. COBRApy supports YAML.
f. RAVEN allows the visualization of simulation results by overlaying information on pre-drawn metabolic maps.
S. coelicolor is a representative species of soil-dwelling, filamentous and Gram-positive actinobacterium harbouring enriched secondary metabolite biosynthesis gene clusters [27,28]. As a well-known pharmaceutical and bioactive compound producer, S. coelicolor has been exploited for antibiotic and secondary metabolite production [29]. The first published GEM for S. coelicolor, iIB711 [30], was improved through an iterative process resulting in the GEMs iMA789 [31] and iMK1208 [32]. The most recent GEM, iMK1208, is a high-quality model that includes 1208 genes and 1643 reactions and was successfully used to predict metabolic engineering targets for increased production of actinorhodin [32].
Here, we demonstrate how the new functions of RAVEN can be used for de novo reconstruction of a S. coelicolor GEM, using comparison to the existing high-quality model iMK1208 as benchmark. The use of three distinct de novo reconstruction approaches enabled capturing most of the existing model, while complementary reactions found through the de novo reconstructions gave the opportunity to improve the existing model. After manual curation, we included 402 new reactions into the GEM, with 320 newly associated enzyme-coding genes, including a variety of biosynthetic pathways for known secondary metabolites (e.g. 2-methylisoborneol, albaflavenone, desferrioxamine, geosmin, hopanoid and flaviolin dimer). The updated S. coelicolor GEM is released as Sco4, which can be used as an upgraded platform for future systems biology research on S. coelicolor and related species.
Results
Genome-scale reconstruction and curation with RAVEN 2.0
RAVEN 2.0 aims to provide a versatile and efficient toolbox for metabolic network reconstruction and curation (Fig 1). In comparison to other solutions for GEM reconstruction (Table 1), the strength of RAVEN is its ability of semi-automated reconstruction based on published models, KEGG and MetaCyc databases, integrating knowledge from diverse sources. A brief overview of RAVEN capabilities is given here, while more technical details are stated in Material & Methods, and detailed documentation is provided for individual functions in the RAVEN package.
RAVEN supports two distinct approaches to initiate GEM reconstruction for an organism of interest: (i) based on protein homology to an existing template model, or (ii) de novo using reaction databases. The first approach requires a high-quality GEM of a phylogenetically closely related organism, and the functions getBlast and getModelFromHomology are used to infer homology using bidirectional BLASTP and build a subsequent draft model. Alternatively, de novo reconstruction can be based on two databases: KEGG and MetaCyc. For KEGG-based reconstruction, the user can deploy getKEGGModelForOrganism to either rely on KEGG-supplied annotations—KEGG currently includes over 5000 genomes—or query its protein sequences for similarity to HMMs that are trained on genes annotated in KEGG. MetaCyc-based reconstruction can be initiated with getMetaCycModelForOrganism that queries protein sequences with BLASTP for homology to enzymes curated in MetaCyc, while addSpontaneous retrieves relevant non-enzyme associated reactions from MetaCyc.
Regardless of which (combination of) approach(es) is followed, a draft model is obtained that requires further curation to result in a high-quality reconstruction suitable for simulating flux distributions. Various RAVEN functions aid in this process, including gapReport that runs a gap analysis and reports e.g. dead-end reactions and unconnected subnetworks that indicate missing reactions and gaps in the model, in addition to reporting metabolites that can be produced or consumed without in- or output from the model, which is indicative of unbalanced reactions. RAVEN is distributed with a gap-filling algorithm gapFill, however, results from external gap-filling approaches can also be readily incorporated. This and further manual curation is facilitated through functions such as addRxnsGenesMets that moves reactions from a template to a draft model, changeGeneAssoc and standardizeGrRules that curate gene associations and combineMetaCycKEGGModels that can semi-automatically unify draft models reconstructed from different databases.
In addition to model generation, RAVEN includes basic simulation capabilities including flux balance analysis (FBA), random sampling of the solution space [33] and flux scanning with enforced objective function (FSEOF) [34]. Models can be handled in various file-formats, including the community standard SBML L3V1 FBCv2 that is compatible with many other constraint-based modelling tools, including the COBRA Toolbox [23], as well as non-MATLAB tools as COBRApy [35] and SBML-R [36]. As the SBML file format is unsuitable for tracking changes between model versions support for flat-text and YAML formats are provided. In addition, models can be represented in a user-friendly Excel format. As a MATLAB package, RAVEN gives users flexibility to build their own reconstruction and analysis pipelines according to their needs.
Draft model reconstruction for S. coelicolor
The enriched capabilities of RAVEN 2.0 were evaluated by de novo generation of GEMs for S. coelicolor using three distinct approaches, as described in Material & Methods (Fig 2). Cross-comparison of genes from de novo reconstructions and the published S. coelicolor GEM iMK1208 indicated that the three de novo approaches are complementary and comprehensive, combined covering 88% of the genes included in iMK1208 (Fig 3). The existing model contained 146 genes that were not annotated by any of the automated approaches, signifying the valuable manual curation that has gone into previous GEMs of S. coelicolor. Nonetheless, matching of metabolites across models through their KEGG identifiers further supported that most of the previous GEM is captured by the three de novo reconstructions, while each approach has their unique contribution (Fig 3).
The three draft reconstructions were consecutively merged to result in a combined draft reconstruction (S1 Data), containing 2605 reactions, of which 958 and 1104 reactions were uniquely from MetaCyc- and KEGG-based reconstructions, respectively (Fig 2). While MetaCyc-based reconstruction annotated more genes than KEGG-based reconstructions (Fig 3), the number of unique reactions by MetaCyc is slightly lower than by KEGG, indicating that KEGG based reconstruction is more likely to assign genes to multiple reactions. Of the 789 reactions from the existing high-quality model that could be mapped to either MetaCyc or KEGG reactions, 733 (92.9%) were included in the combined draft model (S1 Table).
Further development of the obtained model
The combined de novo reconstruction has a larger number of reactions, metabolite and genes than the previous S. coelicolor GEM (Fig 2). While a larger metabolic network does not necessarily imply a better network, we took advantage of the increased coverage of the de novo reconstruction by using it to curate iMK1208, while retaining the valuable contributions from earlier GEMs. The culminating model is called Sco4, the fourth major release of S. coelicolor GEM. Through manual curation, a total of 398 metabolic reactions were selected from the combined model to expand the stoichiometric network of the previous GEM (S3 Table). These new reactions cover diverse subsystems including both primary and secondary metabolism (Fig 4A) and displayed close association with existing metabolites in the previous GEM (Fig 4B). Despite both MetaCyc- and KEGG-based reconstructions contributing roughly equally, MetaCyc-unique reactions are more involved in energy and secondary metabolism, while KEGG-unique reactions are more related to amino acid metabolism and degradation pathways (Fig 4C). The de novo reconstruction annotated genes to 11 reactions that had no gene association in the previous GEM (S4 Table). Together with 34 spontaneous reactions and 10 transport reactions identified by the MetaCyc reconstruction functions (S5 and S6 Tables), the resulting Sco4 model contains 2304 reactions, 1927 metabolites and 1522 genes (Fig 2).
The process of model curation using de novo reconstructions furthermore identified erroneous annotations in the previous GEM. Seventeen metabolites were annotated with invalid KEGG identifiers (S7 Table), impeding matching with the KEGG-based reconstructions. However, by annotating the reactions and metabolites to MetaCyc, we were still able to annotate all 17 metabolites with a valid KEGG identifier, using MetaCyc-provided KEGG annotations. While the KEGG identifiers used in iMK1208 were valid previously, they have since been removed from the KEGG database. Unfortunately, no changelogs are available to trace such revisions.
Simulations and predictions with Sco4
The quality of Sco4 was evaluated through various simulations. It displayed the same performance as iMK1208 in growth prediction on 64 different nutrient sources, with a consistent sensitivity of 90.6% (S8 Table). Experimentally measured growth rates in batch and chemostat cultivations were in good correlation with the growth rates predicted by Sco4 (Fig 5A).
A recent large-scale mutagenesis study produced and analyzed 51,443 S. coelicolor mutants, where each mutant carried a single Tn5 transposition randomly inserted in the genome [37]. No transposition insertions were detected in 79 so-called cold regions of the genome, harboring 132 genes of which 65 are annotated to reactions in Sco4 (S9 Table). The 132 genes are potentially essential, as insertions into these loci would have resulted in a lethal phenotype. However, as it is unclear whether gene essentiality is truly the cause behind the cold-regions, we therefore take the more conservative assumption that genes located outside cold regions are not essential and compared the non-essential gene sets. Simulation with Sco4 indicates a specificity (or true negative rate) of 0.901, which is an increase over the 0.876 of the previous model (Fig 5B).
The S. coelicolor genome project revealed a dense array of secondary metabolite gene clusters both in the core and arms of the linear chromosome (Bentley et al. 2002), and extensive efforts have been made to elucidate these biosynthetic pathways (Van Keulen and Dyson, 2014). The previous GEM of S. coelicolor included only three of these pathways (i.e. actinorhodin, calcium-dependent antibiotic and undecylprodigiosin). Through our de novo reconstruction, we captured the advances that have since been made in elucidating additional pathways: Sco4 describes the biosynthetic pathways of 6 more secondary metabolites (e.g. geosmin). These additional pathways were mainly obtained from the MetaCyc-based reconstruction (Fig 4C).
The expanded description of secondary metabolism was used to predict potential metabolic engineering targets for efficient antibiotic production in S. coelicolor. Flux scanning with enforced objective function (FSEOF) [34] was applied to all secondary metabolic pathways in Sco4 and suggested overexpression targets were compared, with significant overlap between different classes of secondary metabolites (Fig 6, S10 Table). In addition, several targets were predicted to increase production of all modelled secondary metabolites. Three reactions, constituting the pathway from histidine to N-formimidoyl-L-glutamate, and catalyzed by SCO3070, SCO3073 and SCO4932, were commonly identified as potential targets (S10 Table).
Discussion
The RAVEN toolbox aims to assist constraint-based modeling with a focus on network reconstruction and curation. A growing number of biological databases have been incorporated for automated GEM reconstruction (Fig 1). The generation of tissue/cell type-specific models through task-driven model reconstruction (tINIT) has been incorporated to RAVEN 2.0 as built-in resource for human metabolic modeling [19,39]. RAVEN 2.0 was further expanded in this study by integrating the MetaCyc database, including experimentally elucidated pathways, chemically-balanced reactions, as well as associated enzyme sequences (21). This key enhancement brings new features toward high-quality reconstruction, such as inclusion of transport and spontaneous reactions (Table 1).
The performance of RAVEN 2.0 in de novo reconstruction was demonstrated by the large overlap of reactions between the automatically obtained draft model of S. coelicolor and the manually curated iMK1208 model [32]. This indicates that de novo reconstruction with RAVEN is an excellent starting point towards developing a high-quality model, while a combined de novo reconstruction can be produced within hours on a personal computer. We used the de novo reconstructions to curate the existing iMK1208 model, and the resulting Sco4 model was expanded with numerous reactions, metabolites and genes, in part representing recent progress in studies on metabolism of S. coelicolor and related species (Fig 5). We have exploited this new information from biological databases to predict novel targets for metabolic engineering toward establishing S. coelicolor as a potent host for a wide range of secondary metabolites (Fig 7). Therefore, RAVEN 2.0 can be used not only for de novo reconstruction but also model curation and continuous update, which would be necessary for a published GEM to synchronize with the incremental knowledge. We thus deposited the Sco4 as open GitHub repositories for collaborative development with version control.
While RAVEN 2.0 addresses several obstacles and significantly improves GEM reconstruction and curation, a number of challenges remain to be resolved. One major obstacle encountered is matching of metabolites, whether by name or identifier (e.g. KEGG, MetaCyc, ChEBI). Incompatible metabolite nomenclature, incomplete and incorrect annotations all impede fully automatic matching and rather requires intensive manual curation, especially when comparing and combining GEMs from different sources. Efforts have been made to address these issues, e.g. by simplifying manual curation using modelBorgifier [40]. Particularly worth noting is MetaNetX [41], where the MNXref namespace aims to provide a comprehensive cross reference between metabolite and reactions from a wide range of databases, assisting model comparison and integration. Future developments in this direction ultimately leverage this information to automatically reconcile metabolites and reactions across GEMs. Another major challenge is evaluation and tracking of GEM quality. Here we evaluated Sco4 with growth and gene essentiality simulations (Fig 5, S8 Table, S9 Table), however, the GEM modelling community would benefit from such and additional quality tests according to community standards. Exciting ongoing progress here is memote: an open-source software that is under development that contains a community-maintained, standardized set of metabolic model tests [42]. Given the YAML export functionality in RAVEN already supports convenient tracking of model changes in a GitHub repository, this should ideally be combined with tracking model quality with memote, rendering RAVEN suitable for future GEM reconstruction and curation needs.
Material and methods
RAVEN toolbox development
The RAVEN Toolbox 1.0 was released as an open-source MATLAB-package [9], that has since seen minor updates and bugfixes. Since 2016, the development of RAVEN has been organized and tracked at a public GitHub repository (https://github.com/SysBioChalmers/RAVEN). This repository provides a platform for the GEM reconstruction community, with users encouraged to report bugs, request new features and contribute to the development.
The RAVEN Toolbox is based on a defined model structure (S11 Table). Design choices dictate minor differences between COBRA and RAVEN structures, however, bi-directional model conversion is supported through ravenCobraWrapper. Through resolving previously conflicting function names, RAVEN 2.0 is now fully compatible with the COBRA Toolbox. Detailed documentation on the purpose, inputs and outputs for each function are provided in the doc folder.
MetaCyc-based reconstruction module
Novel algorithms were developed to facilitate de novo GEM reconstruction by utilizing the MetaCyc database [26]. In this module, corresponding MATLAB structures were generated from MetaCyc data files (version 21.0) that contained 3118 manually curated pathways with 13,689 metabolites and 15,309 reactions (Fig 7). A total of 17,394 enzymes are associated to these pathways and their protein sequences are included (protseq.fsa). Information from these structures is parsed by getModelFromMetaCyc to generate a model structure containing all metabolites, reactions and enzymes. This MetaCyc model can subsequently be used for de novo GEM reconstruction through the getMetaCycModelForOrganism function (Fig 7). A draft model is generated from MetaCyc enzymes (and associated reactions and metabolites) that show homology to the query protein sequences. Beneficial is that MetaCyc reactions are mass- and charge-balanced, while curated transport enzymes in MetaCyc allow inclusion of transport reactions into the draft model.
In addition, MetaCyc provides 515 reactions that may occur spontaneously. As such reactions have no enzyme association, they are excluded from sequence-based reconstruction and can turn into gaps in the generated models. By cataloguing spontaneous reactions in MetaCyc, the addSpontaneousRxns function can retrieve spontaneous reactions depending the presence of the relevant reactants in the draft model.
KEGG-based reconstruction module
In addition to MetaCyc-based GEM reconstruction, RAVEN 2.0 can utilize the KEGG database for de novo GEM reconstruction. The reconstruction algorithms were significantly enhanced in multiple aspects: the reformatted KEGG database in MATLAB format is updated to version 82.0; and the pipeline to train KEGG Orthology (KO)-specific hidden Markov Models is expanded. Orthologous protein sequences, associated to particular KEGG Orthology (KO), are organised into non-redundant clusters with CD-HIT [43]. These clusters are used as input in multiple-sequence alignment with MAFFT [44], for increased accuracy and speed. The hidden Markov models (HMMs) are then trained for prokaryotic and eukaryotic species with various protein redundancy cut-offs (100%, 90% or 50%) using HMMER3 [45] and can now be automatically downloaded when running getKEGGModelForOrganism.
Combining of MetaCyc- and KEGG-based draft models
To capitalize on the complementary information from MetaCyc- and KEGG-based reconstructions, RAVEN 2.0 facilitates combining draft models from both approaches into one unified draft reconstruction (Fig 2). Prior to combining, reactions shared by MetaCyc- and KEGG-based reconstructions are mapped using MetaCyc-provided cross-references to their respective KEGG counterparts (S12 Table). Additional reactions are associated by linkMetaCycKEGGRxns through matching the metabolites, aided by cross-references between MetaCyc and KEGG identifiers (S13 Table). Subsequently, the combineMetaCycKEGGModels function thoroughly queries the two models for identical reactions, discarding the KEGG versions while keeping the corresponding MetaCyc reactions. In the combined model, MetaCyc naming convention is preferentially used such that unique metabolites and reactions from KEGG-based draft model are replaced with their MetaCyc equivalents whenever possible. The combined draft model works as a starting point for additional manual curation, to result in a high-quality reconstruction.
Miscellaneous improvements
RAVEN 2.0 contains a range of additional enhancements. Linear problems can be solved through either the Gurobi (Gurobi Optimization Inc., Houston, Texas) or MOSEK (MOSEK ApS, Copenhagen, Denmark) solvers. Various file formats are supported for import and export of models, including Microsoft Excel through Apache POI (The Apache Software Foundation, Wakefield, Massachusetts), the community standard SBML Level 3 Version 1 FBC Package Version 2 through libSBML [46] and YAML for easy tracking of differences between model files. Meanwhile, backwards compatibility ensures that Excel and SBML files generated by earlier RAVEN versions can still be imported.
From de novo draft GEMs to Sco4
An improved GEM of S. coelicolor, called Sco4 for the fourth major published model, was generated through RAVEN 2.0 following the pipeline illustrated in Fig 3. The model is based on the complete genome sequences of S. coelicolor A3(2), including chromosome and two plasmids (GenBank accession: GCA_000203835.1) [27]. MetaCyc-based draft model was generated with getMetaCycModelForOrganism using default cut-offs (bit-score ≥ 100, positives ≥ 45%). Two KEGG-based draft models were generated with getKEGGModelForOrganism by (i) using 'sco' as KEGG organism identifier, and (ii) querying the S. coelicolor proteome against HMMs trained on prokaryotic sequences with 90% sequence identity. These two models were merged with mergeModels, subsequently combined with the MetaCyc-based draft using combineMetaCycKEGGModels, followed by manual curation. Reactions were mapped from iMK1208 to MetaCyc and KEGG identifiers in a semi-automated manner (S1 Table). Metabolites in iMK1208 were associated to MetaCyc and KEGG identifiers through examining the mapped reactions (S2 Table). Pathway gaps and invalid metabolite identifiers were thus detected and revised accordingly.
Manual curation of the combined draft and iMK1208 culminated in the Sco4 model. Curation entailed identifying reactions from the combined draft, considering the absence of gene-associations in iMK1208; explicit subsystem and/or pathway information; support from both MetaCyc and KEGG reconstructions; additional literature information, as well as potential taxonomic conflicts. Manual curation was particularly required for secondary metabolite biosynthetic pathways, due to high levels of sequence similarity among the synthetic domains of polyketide synthase and nonribosomal peptide synthetase [47]. The identified new reactions were added to Sco4, while retaining the previous manual curation underlying iMK1208. Spontaneous reactions were added through addSpontaneousRxns, while transport reactions annotated in the MetaCyc-based reconstruction were manually curated. Gene essentiality was simulated on iMK1208 and Sco4 by the COBRA function singleGeneDeletion, with a more than 75% reduction in growth rate identifying essential reactions. Potential targets for metabolic engineering were predicted using the flux scanning with enforced objective function FSEOF [34]. The reconstruction and curation of Sco4 is provided as a MATLAB script in the ComplementaryScripts folder of the Sco4 GitHub repository.
Model repository
The updated Sco4 model is deposited to a GitHub repository in MATLAB .mat, SBML L3V1 FBCv2 .xml, Excel .xlsx, YAML .yml and flat-text .txt formats (https://github.com/SysBioChalmers/Streptomyces_coelicolor-GEM). Users can not only download the most recent version of the model, but also report issues and suggest changes. Updates in the metabolic network or gene associations can readily be tracked by querying the difference in the flat-text model and YAML representations. As such, Sco4 aims to be a community model, where improved knowledge and annotation will incrementally and constantly refine the model of S. coelicolor.
Availability
RAVEN is an open source software package available in the GitHub repository (https://github.com/SysBioChalmers/RAVEN). The updated S. coelicolor genome-scale metabolic model Sco4 is available as a public GitHub repository at (https://github.com/SysBioChalmers/Streptomyces_coelicolor-GEM).
Supporting information
Acknowledgments
We thank Dr. Sylvain Prigent and Dr Thomas Pfau for valuable discussions and suggestions.
Data Availability
Most relevant data are within the paper and its Supporting Information files. Supporting Information files are also accessible at: http://doi.org/10.6084/m9.figshare.6236903. The RAVEN Toolbox 2.0 is freely available from https://github.com/SysBioChalmers/RAVEN. The Sco4 model is freely available from https://github.com/SysBioChalmers/Streptomyces_coelicolor-GEM.
Funding Statement
The authors acknowledge funding for the ERASysApp project SYSTERACT provided by Västra Götalandsregionen (RUN 612-0436-15), http://www.vgregion.se/; and addition funding by the Novo Nordisk Foundation, http://novonordiskfonden.dk/ and the Knut and Alice Wallenberg Foundation, https://kaw.wallenberg.org/. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Thiele I, Palsson BØ. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat Protoc. 2010;5: 93–121. 10.1038/nprot.2009.203 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.O’Brien EJ, Monk JM, Palsson BO. Using genome-scale models to predict biological capabilities. Cell. Elsevier Inc.; 2015;161: 971–987. 10.1016/j.cell.2015.05.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Simeonidis E, Price ND. Genome-scale modeling for metabolic engineering. J Ind Microbiol Biotechnol. 2015;42: 327–38. 10.1007/s10295-014-1576-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Magnúsdóttir S, Heinken A, Kutt L, Ravcheev DA, Bauer E, Noronha A, et al. Generation of genome-scale metabolic reconstructions for 773 members of the human gut microbiota. Nat Biotechnol. Nature Publishing Group; 2016;35: 81–89. 10.1038/nbt.3703 [DOI] [PubMed] [Google Scholar]
- 5.Mardinoglu A, Gatto F, Nielsen J. Genome-scale modeling of human metabolism—a systems biology approach. Biotechnol J. 2013;8: 985–996. 10.1002/biot.201200275 [DOI] [PubMed] [Google Scholar]
- 6.Nielsen J. Systems Biology of Metabolism: A Driver for Developing Personalized and Precision Medicine. Cell Metab. Elsevier Inc.; 2017;25: 572–579. 10.1016/j.cmet.2017.02.002 [DOI] [PubMed] [Google Scholar]
- 7.Hwang K- S, Kim HU, Charusanti P, Palsson BØ, Lee SY. Systems biology and biotechnology of Streptomyces species for the production of secondary metabolites. Biotechnol Adv. Elsevier Inc.; 2014;32: 255–268. 10.1016/j.biotechadv.2013.10.008 [DOI] [PubMed] [Google Scholar]
- 8.Kim WJ, Kim HU, Lee SY. Current state and applications of microbial genome-scale metabolic models. Curr Opin Syst Biol. Elsevier Ltd; 2017;2: 10–18. 10.1016/j.coisb.2017.03.001 [DOI] [Google Scholar]
- 9.Agren R, Liu L, Shoaie S, Vongsangnak W, Nookaew I, Nielsen J. The RAVEN Toolbox and Its Use for Generating a Genome-scale Metabolic Model for Penicillium chrysogenum. PLoS Comput Biol. 2013;9: e1002980 10.1371/journal.pcbi.1002980 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: New perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017;45: D353–D361. 10.1093/nar/gkw1092 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Thompson RA, Dahal S, Garcia S, Nookaew I, Trinh CT. Exploring complex cellular phenotypes and model-guided strain design with a novel genome-scale metabolic model of Clostridium thermocellum DSM 1313 implementing an adjustable cellulosome. Biotechnol Biofuels. BioMed Central; 2016;9: 194 10.1186/s13068-016-0607-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Hamilton JJ, Calixto Contreras M, Reed JL. Thermodynamics and H2 Transfer in a Methanogenic, Syntrophic Community. PLoS Comput Biol. 2015;11: 1–20. 10.1371/journal.pcbi.1004364 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Shoaie S, Ghaffari P, Kovatcheva-Datchary P, Mardinoglu A, Sen P, Pujos-Guillot E, et al. Quantifying Diet-Induced Metabolic Changes of the Human Gut Microbiome. Cell Metab. 2015;22: 320–331. 10.1016/j.cmet.2015.07.001 [DOI] [PubMed] [Google Scholar]
- 14.Levering J, Broddrick J, Dupont CL, Peers G, Beeri K, Mayers J, et al. Genome-scale model reveals metabolic basis of biomass partitioning in a model diatom. PLoS One. 2016;11: 1–22. 10.1371/journal.pone.0155038 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Chiappino-Pepe A, Tymoshenko S, Ataman M, Soldati-Favre D, Hatzimanikatis V. Bioenergetics-based modeling of Plasmodium falciparum metabolism reveals its essential genes, nutritional requirements, and thermodynamic bottlenecks. PLoS Comput Biol. 2017;13: 1–24. 10.1371/journal.pcbi.1005397 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Sharma M, Shaikh N, Yadav S, Singh S, Garg P, Nowicki C, et al. A systematic reconstruction and constraint-based analysis of Leishmania donovani metabolic network: identification of potential antileishmanial drug targets. Mol BioSyst. Royal Society of Chemistry; 2017;277: 38245–38253. 10.1039/C6MB00823B [DOI] [PubMed] [Google Scholar]
- 17.Tymoshenko S, Oppenheim RD, Agren R, Nielsen J, Soldati-Favre D, Hatzimanikatis V. Metabolic Needs and Capabilities of Toxoplasma gondii through Combined Computational and Experimental Analysis. Maranas CD, editor. PLoS Comput Biol. 2015;11: e1004261 10.1371/journal.pcbi.1004261 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ledesma-Amaro R, Kerkhoven EJ, Revuelta JL, Nielsen J. Genome scale metabolic modeling of the riboflavin overproducer Ashbya gossypii. Biotechnol Bioeng. 2014;111: 1191–1199. 10.1002/bit.25167 [DOI] [PubMed] [Google Scholar]
- 19.Agren R, Mardinoglu A, Asplund A, Kampf C, Uhlen M, Nielsen J. Identification of anticancer drugs for hepatocellular carcinoma through personalized genome-scale metabolic modeling. Mol Syst Biol. 2014;10: 1–13. 10.1002/msb.145122 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Hur W, Ryu JY, Kim HU, Hong SW, Lee EB, Lee SY, et al. Systems approach to characterize the metabolism of liver cancer stem cells expressing CD133. Sci Rep. Nature Publishing Group; 2017;7: 45557 10.1038/srep45557 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Mardinoglu A, Agren R, Kampf C, Asplund A, Uhlen M, Nielsen J. Genome-scale metabolic modelling of hepatocytes reveals serine deficiency in patients with non-alcoholic fatty liver disease. Nat Commun. Nature Publishing Group; 2014;5: 1–11. 10.1038/ncomms4083 [DOI] [PubMed] [Google Scholar]
- 22.Blais EM, Rawls KD, Dougherty B V., Li ZI, Kolling GL, Ye P, et al. Reconciled rat and human metabolic networks for comparative toxicogenomics and biomarker predictions. Nat Commun. Nature Publishing Group; 2017;8: 1–15. 10.1038/s41467-016-0009-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Schellenberger J, Que R, Fleming RMT, Thiele I, Orth JD, Feist AM, et al. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0. Nat Protoc. 2011;6: 1290–1307. 10.1038/nprot.2011.308 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Becker SA, Feist AM, Mo ML, Hannum G, Palsson BØ, Herrgard MJ. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox. Nat Protoc. 2007;2: 727–38. 10.1038/nprot.2007.99 [DOI] [PubMed] [Google Scholar]
- 25.Heirendt L, Arreckx S, Pfau T, Mendoza SN, Richelle A, Heinken A, et al. Creation and analysis of biochemical constraint-based models: the COBRA Toolbox v3.0. arXiv. 2017; 1710.04038v2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Caspi R, Billington R, Ferrer L, Foerster H, Fulcher CA, Keseler IM, et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 2016;44: D471–D480. 10.1093/nar/gkv1164 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Bentley S, Chater K, Cerdeño-Tárraga A- M, Challis GL, Thomson NR, James KD, et al. Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2). Nature. 2002;417: 141–147. 10.1038/417141a [DOI] [PubMed] [Google Scholar]
- 28.Wang H, Fewer DP, Holm L, Rouhiainen L, Sivonen K. Atlas of nonribosomal peptide and polyketide biosynthetic pathways reveals common occurrence of nonmodular enzymes. Proc Natl Acad Sci U S A. 2014;111: 9259–64. 10.1073/pnas.1401734111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Challis GL. Exploitation of the Streptomyces coelicolor A3(2) genome sequence for discovery of new natural products and biosynthetic pathways. J Ind Microbiol Biotechnol. 2014;41: 219–32. 10.1007/s10295-013-1383-2 [DOI] [PubMed] [Google Scholar]
- 30.Borodina I, Krabben P, Nielsen J. Genome-scale analysis of Streptomyces coelicolor A3(2) metabolism. Genome Res. 2005;15: 820–9. 10.1101/gr.3364705 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Alam MT, Merlo ME, Hodgson DA, Wellington EMH, Takano E, Breitling R. Metabolic modeling and analysis of the metabolic switch in Streptomyces coelicolor. BMC Genomics. 2010;11: 202 10.1186/1471-2164-11-202 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Kim M, Sang Yi J, Kim J, Kim J- N, Kim MW, Kim B- G. Reconstruction of a high-quality metabolic model enables the identification of gene overexpression targets for enhanced antibiotic production in Streptomyces coelicolor A3(2). Biotechnol J. 2014;9: 1185–94. 10.1002/biot.201300539 [DOI] [PubMed] [Google Scholar]
- 33.Bordel S, Agren R, Nielsen J. Sampling the solution space in genome-scale metabolic networks reveals transcriptional regulation in key enzymes. PLoS Comput Biol. 2010;6: e1000859 10.1371/journal.pcbi.1000859 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Choi HS, Lee SY, Kim TY, Woo HM. In Silico Identification of Gene Amplification Targets for Improvement of Lycopene Production. Appl Environ Microbiol. 2010;76: 3097–3105. 10.1128/AEM.00115-10 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Ebrahim A, Lerman JA, Palsson BO, Hyduke DR. COBRApy: COnstraints-Based Reconstruction and Analysis for Python. BMC Syst Biol. 2013;7: 74 10.1186/1752-0509-7-74 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Radivoyevitch T, Venkateswaran V. SBMLR: SBML-R Interface and Analysis Tools. 2015.
- 37.Xu Z, Wang Y, Chater KF, Ou H, Xu HH, Deng Z, et al. Large-Scale Transposition Mutagenesis of Streptomyces coelicolor Identifies Hundreds of Genes Influencing Antibiotic Biosynthesis. Drake HL, editor. Appl Environ Microbiol. 2017;83: e02889–16. 10.1128/AEM.02889-16 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Pabinger S, Rader R, Agren R, Nielsen J, Trajanoski Z. MEMOSys: Bioinformatics platform for genome-scale metabolic models. BMC Syst Biol. 2011;5: 20 10.1186/1752-0509-5-20 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Agren R, Bordel S, Mardinoglu A, Pornputtapong N, Nookaew I, Nielsen J. Reconstruction of genome-scale active metabolic networks for 69 human cell types and 16 cancer types using INIT. PLoS Comput Biol. 2012;8 10.1371/journal.pcbi.1002518 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Sauls JT, Buescher JM. Assimilating genome-scale metabolic reconstructions with modelBorgifier. Bioinformatics. 2014;30: 1036–1038. 10.1093/bioinformatics/btt747 [DOI] [PubMed] [Google Scholar]
- 41.Moretti S, Martin O, Van Du Tran T, Bridge A, Morgat A, Pagni M. MetaNetX/MNXref–reconciliation of metabolites and biochemical reactions to bring together genome-scale metabolic networks. Nucleic Acids Res. 2016;44: D523–D526. 10.1093/nar/gkv1117 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Lieven C, Beber ME, Olivier BG, Bergmann FT, Chauhan S, Correia K, et al. Memote: A community driven effort towards a standardized genome-scale metabolic model test suite. bioRxiv. 2018; 350991 10.1101/350991 [DOI] [Google Scholar]
- 43.Li W, Godzik A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22: 1658–1659. 10.1093/bioinformatics/btl158 [DOI] [PubMed] [Google Scholar]
- 44.Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol Biol Evol. 2013;30: 772–780. 10.1093/molbev/mst010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Finn RD, Clements J, Eddy SR. HMMER web server: Interactive sequence similarity searching. Nucleic Acids Res. 2011;39: 29–37. 10.1093/nar/gkr367 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Bornstein BJ, Keating SM, Jouraku A, Hucka M. LibSBML: An API library for SBML. Bioinformatics. 2008;24: 880–881. 10.1093/bioinformatics/btn051 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Medema MH, Blin K, Cimermancic P, De Jager V, Zakrzewski P, Fischbach MA, et al. AntiSMASH: Rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res. 2011;39: 339–346. 10.1093/nar/gkr466 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Most relevant data are within the paper and its Supporting Information files. Supporting Information files are also accessible at: http://doi.org/10.6084/m9.figshare.6236903. The RAVEN Toolbox 2.0 is freely available from https://github.com/SysBioChalmers/RAVEN. The Sco4 model is freely available from https://github.com/SysBioChalmers/Streptomyces_coelicolor-GEM.