Toward community standards and software for whole-cell modeling

Dagmar Waltemath; Jonathan R Karr; Frank T Bergmann; Vijayalakshmi Chelliah; Michael Hucka; Marcus Krantz; Wolfram Liebermeister; Pedro Mendes; Chris J Myers; Pinar Pir; Begum Alaybeyoglu; Naveen K Aranganathan; Kambiz Baghalian; Arne T Bittig; Paulo E Pinto Burke; Matteo Cantarelli; Yin Hoon Chew; Rafael S Costa; Joseph Cursons; Tobias Czauderna; Arthur P Goldberg; Harold F Gómez; Jens Hahn; Tuure Hameri; Daniel F Hernandez Gardiol; Denis Kazakiewicz; Ilya Kiselev; Vincent Knight-Schrijver; Christian Knüpfer; Matthias König; Daewon Lee; Audald Lloret-Villas; Nikita Mandrik; J Kyle Medley; Bertrand Moreau; Hojjat Naderi-Meshkin; Sucheendra K Palaniappan; Daniel Priego-Espinosa; Martin Scharm; Mahesh Sharma; Kieran Smallbone; Natalie J Stanford; Je-Hoon Song; Tom Theile; Milenko Tokic; Namrata Tomar; Vasundra Touré; Jannis Uhlendorf; Thawfeek M Varusai; Leandro H Watanabe; Florian Wendland; Markus Wolfien; James T Yurkovich; Yan Zhu; Argyris Zardilis; Anna Zhukova; Falk Schreiber

doi:10.1109/TBME.2016.2560762

. Author manuscript; available in PMC: 2017 May 31.

Published in final edited form as: IEEE Trans Biomed Eng. 2016 Jun 10;63(10):2007–2014. doi: 10.1109/TBME.2016.2560762

Toward community standards and software for whole-cell modeling

Dagmar Waltemath ^1,^†,^✉, Jonathan R Karr ^2,^†,^✉, Frank T Bergmann ³, Vijayalakshmi Chelliah ⁴, Michael Hucka ⁵, Marcus Krantz ⁶, Wolfram Liebermeister ⁷, Pedro Mendes ⁸, Chris J Myers ⁹, Pinar Pir ¹⁰, Begum Alaybeyoglu ¹¹, Naveen K Aranganathan ¹², Kambiz Baghalian ¹³, Arne T Bittig ¹⁴, Paulo E Pinto Burke ¹⁵, Matteo Cantarelli ¹⁶, Yin Hoon Chew ¹⁷, Rafael S Costa ¹⁸, Joseph Cursons ¹⁹, Tobias Czauderna ²⁰, Arthur P Goldberg ²¹, Harold F Gómez ²², Jens Hahn ²³, Tuure Hameri ²⁴, Daniel F Hernandez Gardiol ²⁵, Denis Kazakiewicz ²⁶, Ilya Kiselev ²⁷, Vincent Knight-Schrijver ²⁸, Christian Knüpfer ²⁹, Matthias König ³⁰, Daewon Lee ³¹, Audald Lloret-Villas ³², Nikita Mandrik ³³, J Kyle Medley ³⁴, Bertrand Moreau ³⁵, Hojjat Naderi-Meshkin ³⁶, Sucheendra K Palaniappan ³⁷, Daniel Priego-Espinosa ³⁸, Martin Scharm ³⁹, Mahesh Sharma ⁴⁰, Kieran Smallbone ⁴¹, Natalie J Stanford ⁴², Je-Hoon Song ⁴³, Tom Theile ⁴⁴, Milenko Tokic ⁴⁵, Namrata Tomar ⁴⁶, Vasundra Touré ⁴⁷, Jannis Uhlendorf ⁴⁸, Thawfeek M Varusai ⁴⁹, Leandro H Watanabe ⁵⁰, Florian Wendland ⁵¹, Markus Wolfien ⁵², James T Yurkovich ⁵³, Yan Zhu ⁵⁴, Argyris Zardilis ⁵⁵, Anna Zhukova ⁵⁶, Falk Schreiber ⁵⁷

¹Institute of Computer Science, University of Rostock, 18051 Rostock, Germany

²Department of Genetics & Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA

³BioQuant, University of Heidelberg, 69120 Heidelberg, Germany

⁴European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Cambridge CB10 1SD, UK

⁵Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA 91125, USA

⁶Department of Biology, Humboldt University of Berlin, 10115 Berlin, Germany

⁷Institute of Biochemistry, University Medicine Charité Berlin, 10117 Berlin, Germany

⁸Manchester Institute of Biotechnology and the School of Computer Science, University of Manchester, Manchester M1 7DN, UK and also with the Center for Quantitative Medicine and the Department of Cell Biology, University of Connecticut Health Center, Farmington, CT 06030, USA

⁹Department of Electrical and Computer Engineering, University of Utah, Salt Lake City, Utah 84112, USA

¹⁰Gebze Technical University, Kocaeli 41400, Turkey

¹¹Department of Chemical Engineering, Boǧaziçi University, Bebek 34342, Turkey

¹²European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Cambridge CB10 1SD, UK

¹³Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, UK

¹⁴Institute of Computer Science, University of Rostock, 18051 Rostock, Germany

¹⁵Institute of Science and Technology, Federal University of São Paulo, Brazil

¹⁶OpenWorm

¹⁷Department of Genetics & Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA. Centre for Synthetic and Systems Biology, University of Edinburgh, Edinburgh EH9 3BF, UK

¹⁸Centre of Intelligent Systems-IDMEC, Instituto Superior Técnico, University of Lisbon, 1049-001 Lisboa, Portugal

¹⁹Systems Biology Laboratory, Melbourne School of Engineering, University of Melbourne, Parkville, VIC 3010, Australia, and also with the ARC Centre of Excellence in Convergent Bio-Nano Science and Technology, Melbourne School of Engineering, University of Melbourne, Parkville, VIC 3010

²⁰Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia

²¹Department of Genetics & Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA

²²Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland

²³Department of Biology, Humboldt University of Berlin, 10115 Berlin, Germany

²⁴Laboratory of Computational Systems Biotechnology (LCSB), Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland

²⁵Laboratory of Computational Systems Biotechnology (LCSB), Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland

²⁶Center for Statistics, Universiteit Hasselt, Hasselt BE3500, Belgium, and also with the Center for Innovative Research, Medical University of Białystok, Białystok 15-089, Poland

²⁷Design Technological Institute of Digital Techniques, Siberian Branch of the Russian Academy of Sciences, Novosibirsk 630090, Russia

²⁸Babraham Institute, Cambridge CB22 3AT, UK

²⁹Institut für Informatik, University of Jena, 07743 Jena, Germany

³⁰Institute of Biochemistry, University Medicine Charité Berlin, 10117 Berlin, Germany. also with the Institute for Theoretical Biology, Humboldt-University Berlin, Invalidenstrae 43, 10115 Berlin, Germany

³¹Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon 305-701, Republic of Korea

³²European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Cambridge CB10 1SD, UK

³³Sobolev Institute of Mathematics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk 630090, Russia

³⁴Department of Bioengineering, University of Washington, Seattle, WA 98195, USA

³⁵CoSMo Company, Lyon, France

³⁶Stem Cell and Regenerative Medicine Research Department, Iranian Academic Center for Education, Culture Research (ACECR), Khorasan Razavi Branch, Mashhad, Iran

³⁷Rennes - Bretagne Atlantique Research Centre, Institute for Research in Computer Science and Automation, 35042 Rennes Cedex, France

³⁸Instituto de Ciencias Físicas, Universidad Nacional Autónoma de México, México

³⁹Institute of Computer Science, University of Rostock, 18051 Rostock, Germany

⁴⁰Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Punjab 160062, India

⁴¹Manchester Centre for Integrative Systems Biology, University of Manchester, Manchester M1 7DN, UK

⁴²Manchester Centre for Integrative Systems Biology, University of Manchester, Manchester M1 7DN, UK

⁴³Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon 305-701, Republic of Korea

⁴⁴Institute of Computer Science, University of Rostock, 18051 Rostock, Germany

⁴⁵Laboratory of Computational Systems Biotechnology (LCSB), Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland. also with the Swiss Institute of Bioinformatics (SIB), CH-1015 Switzerland

⁴⁶Department of Dermatology, University Medicine, Friedrich-Alexander University of Erlangen-Nürnberg, Erlangen, Germany

⁴⁷Institute of Computer Science, University of Rostock, 18051 Rostock, Germany

⁴⁸Department of Biology, Humboldt University of Berlin, 10115 Berlin, Germany

⁴⁹Department of Systems Biology Ireland, University College Dublin, Belfield, Dublin 4, Ireland

⁵⁰Department of Electrical and Computer Engineering, University of Utah, Salt Lake City, Utah 84112, USA

⁵¹Institute of Computer Science, University of Rostock, 18051 Rostock, Germany

⁵²Institute of Computer Science, University of Rostock, 18051 Rostock, Germany

⁵³Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA

⁵⁴Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, VIC 3052, Australia

⁵⁵Centre for Synthetic and Systems Biology, University of Edinburgh, UK

⁵⁶Institut de Biochimie et Génétique Cellulaires, National Center for Scientific Research, and also with the University of Bordeaux, France, 33077 Bordeaux Cedex, France

⁵⁷Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia and also with the Department of Computer and Information Science, University of Konstanz, 78457 Konstanz, Germany

Dagger indicates these authors contributed equally to this work.

^✉

Corresponding author.

Roles

Chris J Myers: Fellow, IEEE

PMCID: PMC5451320 NIHMSID: NIHMS859666 PMID: 27305665

Abstract

Objective

Whole-cell (WC) modeling is a promising tool for biological research, bioengineering, and medicine. However, substantial work remains to create accurate, comprehensive models of complex cells.

Methods

We organized the 2015 Whole-Cell Modeling Summer School to teach WC modeling and evaluate the need for new WC modeling standards and software by recoding a recently published WC model in SBML.

Results

Our analysis revealed several challenges to representing WC models using the current standards.

Conclusion

We, therefore, propose several new WC modeling standards, software, and databases.

Significance

We anticipate that these new standards and software will enable more comprehensive models.

Index Terms: Whole-cell modeling, Systems biology, Computational biology, Simulation, Standards, Education

graphic file with name nihms859666f3.jpg — **The 2015 Whole-Cell Modeling Summer School in Rostock** included the 54 participants listed in Table SI. Photo: University of Rostock IT and Media Center.

I. Introduction

Computational modeling is a powerful tool for biological research, bioengineering, and medicine to understand complex systems. It has been used to identify gene functions [1], engineer metabolic pathways [2], and identify drug targets [3]. Computational models also have the potential to help bioengineers design new microorganisms that can synthesize high value chemicals, sense toxins, and decontaminate waste, as well as help clinicians interpret individual ‘omics profiles and personalize medical therapy [4]. Realizing this potential requires more comprehensive models that can predict phenotype from genotype. In turn, this requires improved modeling and simulation standards and software [5–10].

Recently, Karr et al. developed the first whole-cell (WC) model which represents every individual gene function [11]. The model represents the life cycle of a single Mycoplasma genitalium bacterial cell and predicts the dynamics of every molecular species. The model is composed of 28 pathway sub-models that are represented using multiple mathematical formalisms including stochastic simulation, ordinary differential equations (ODEs), flux balance analysis (FBA), and Boolean rules (BRs). The model was implemented in MATLAB.

The M. genitalium model has been used to gain novel insights into non-genetic cell cycle regulation mechanisms [11], learn unknown kinetic rate parameters from phenotypic data [12], calculate the metabolic costs of synthetic circuits [13], and repurpose antibiotics [14].

Karr et al. extensively documented the model; developed the WholeCellKB [15], WholeCellSimDB [16], and Whole-CellViz [17] software tools to provide user-friendly interfaces to the model; and published the model open-source. This has enabled other researchers to reuse the model [12–14].

However, significant domain expertise is still needed to reuse the model or to develop new WC models. The multi-algorithm modeling methodology is complex. The model is difficult to understand, reuse, and extend because it is described directly in terms of its numerical simulation rather than in a software-independent format. The model code is difficult to learn and reuse because it is large, complex, and intertwined with the details of the M. genitalium model. The simulation code is also slow. Furthermore, the simulation code requires the proprietary MATLAB software package.

New standards and software tools are needed to help researchers build and simulate WC models. They would help researchers reuse, reproduce, and compare models, as well as share models through repositories such as BioModels [18].

Several systems biology standards have been developed by the COmputational Modeling in BIology NEtwork (COMBINE) [8], including the Systems Biology Markup Language (SBML) [19], CellML [20], the Simulation Experiment Description Markup Language (SED-ML) [21], and the Systems Biology Graphical Notation (SBGN) [22] (Table I). SBML and CellML are formats for representing mathematical models. CellML describes the mathematics whereas SBML describes biological processes. Both support several modeling formalisms including ODEs and FBA. SED-ML describes and enables researchers to reproduce computational experiments. SBGN is a visual notation for describing biological processes. However, none of these standards have been used for WC modeling.

Table I.

Systems biology standards and standardization efforts.

Acronym	Name	Type	Description	Ref.
CellML	CellML	Standard	Describes models in terms of mathematical relationships	20
COMBINE	COmputational Modeling in BIology NEtwork	Community	Develops computational biology standards and software	8
SBGN	Systems Biology Graphical Notation	Standard	Describes biochemical pathway diagrams	23
SBML	Systems Biology Markup Language	Standard	Describes models in terms of biochemical processes	24
SBML Arrays	SBML Package: Arrays	Standard	Describes arrays	25
SBML Comp	SBML Package: Hierarchical Model Composition	Standard	Describes how model are composed from other models	26
SBML Distrib	SBML Package: Distributions	Standard	Describes random distributions	27
SBML FBC	SBML Package: Flux Balance Constraints	Standard	Describes constraint-based models	28
SBML Multi	SBML Package: Multistate and Multicomponent Species	Standard	Supports rule-based modeling	25
SBML Spatial	SBML Package: Spatial Processes	Standard	Describes spatially-resolved models	29
SED-ML	Simulation Experiment Description Markup Language	Standard	Describes computational experiments	21

Open in a new tab

We organized the 2015 Whole-Cell Modeling Summer School to train students in WC modeling and to evaluate the need for new WC modeling standards and software. The school focused on creating a reusable WC model by recoding the M. genitalium model in SBML. We focused on SBML because SBML is the most widely used systems biology standard and there was insufficient time to evaluate multiple standards. The school also aimed to improve numerous details of the model, visualize the model with SBGN, and describe model simulations with SED-ML. The latest versions of our SBML-encoded submodels and SBGN diagrams are available at https://github.com/whole-cell-tutors/wholecell/releases/tag/meeting-report.

Most importantly, the school generated extensive community discussion on how to best build and simulate WC models. This report describes the outcome of these discussions, including our recommendations for new standards and software to accelerate WC modeling. We also describe our progress toward recoding the M. genitalium model in SBML and the lessons that we learned about organizing research-based schools.

II. The 2015 Whole-Cell Modeling Summer School

The school was held March 9–13, 2015, at the University of Rostock, Germany. It was organized by D. Waltemath and F. Schreiber and funded by the Volkswagen Foundation. 43 students and nine instructors participated in the school. A follow up meeting involving 15 of the original and six additional participants was held October 10–11, 2015, at the University of Utah, USA. All of the materials for the school are available at http://sites.google.com/site/vwwholecellsummerschool.

We advertised the school through community mailing lists, conference calendars, and websites. Applicants were asked to describe their experience and interest in WC modeling. We chose 43 participants from 118 applicants based on three criteria. (1) We identified the most qualified and enthusiastic applicants. (2) We gave preference to students, female applicants, and applicants from developing countries. (3) We selected participants to represent a broad range of scientific disciplines. We used the same criteria to select instructors.

The school began with introductory lectures on WC modeling and the existing systems biology standards by J. Karr and M. Hucka and introductory discussions on model composition, state representation, and stochastic modeling. Most of the school was devoted to active learning sessions in which the students and instructors were divided into 11 groups and challenged to use SBML to recode the M. genitalium model, use SBGN to visualize the model, and use SED-ML to simulate the model. Groups 1–8 encoded submodels. Group 9 developed a submodel integration scheme. Group 10 annotated and visualized the model. Group 11 helped all of the other groups understand, encode, and improve the model. Table SII lists the groups and participants of both meetings. Each day concluded with community discussions. In addition, the school included a poster session and networking activities.

The students learned about state-of-the-art WC modeling; the open challenges to building more complex models; open-source modeling software; the importance of reproducibility; and the SBML, SED-ML, and SBGN standards. The students also expanded their professional networks. Several of the students reported that the skills and knowledge they gained from the school would enhance their research.

We learned several lessons about organizing research-based schools. (1) Students enjoy working on research problems more than solving prescribed exercises. This engages students in the field, challenges them, and helps them build practical skills. (2) Research-based schools should have clear background knowledge expectations, learning objectives, and research goals. This helps students decide whether to participate, prepare, and learn efficiently. (3) Research-based schools should have a flexible schedule, multidisciplinary participants, and a high teacher-to-student ratio. This allows students to engage in impromptu discussions, draw on multiple perspectives, and get feedback and iterate quickly.

III. Toward an improved SBML-encoded WC model

In addition to teaching students about WC modeling and the systems biology standards, the school aimed to improve the M. genitalium model and to encode the model in SBML.

A. Submodel encoding

We pursued several strategies to encode submodels in SBML. Several groups encoded submodels by (1) reading the original documentation of the model, (2) drawing pathway diagrams using software tools such as CellDesigner [30] and VANTED [31], and (3) writing scripts to generate SBML from the diagrams. Other groups used model design tools such as Antimony [32], BioUML [33], COBRApy [34], COPASI [35], iBioSim [36], and libRoadRunner [37] to recode submodels based on the original documentation. A few of the groups encoded submodels by converting the MATLAB code to SBML. As an example, Fig. 1 and File S1 illustrate how we recoded the transcription submodel.

Comparison of the original and SBML transcription submodels. (A) The original transcription submodel included two sub-submodels: (1) a Markov model that describes how RNA polymerase switches among freely diffusing, non-specifically bound, and initiating states and (2) an ad hoc stochastic model that describes how RNA polymerase initiates transcription, elongates individual bases by walking along DNA, and terminates transcripts. (B) We created the SBML transcription submodel by simplifying the original submodel. The SBML submodel only represents transcription initiation, elongation, and termination; lumps the initiation, elongation, and termination of each RNA species into a single reaction; and does not explicitly represent DNA-protein binding. (C) An equivalent population-based ad hoc stochastic simulation algorithm for the original submodel. The original submodel was implemented using a more efficient particle-based algorithm. To facilitate comparison with the population-based SBML version, we have described an equivalent population-based algorithm. (D) We also improved the SBML submodel by replacing the ad hoc stochastic simulation algorithm with the Gillespie algorithm. (E) Statistics of the original and improved transcription submodels in population-based representations.

We encountered several challenges to encoding the submodels in SBML. First, understanding the submodels was time-consuming because many students were not familiar with the modeled biology, many of the submodel details are described only in the MATLAB code, and the model documentation only summarizes the model. For these reasons, J. Karr, one of the authors of the original model, helped all of the groups understand the modeled biology and mathematics. Dr. Karr also helped several groups simplify their encoding tasks by recommending that they recode only the most important model components. For example, Dr. Karr suggested that the transcription group represent the transcription of each RNA species as a single lumped reaction rather than hundreds of thousands of individual base elongation reactions. It would have been challenging to recode the model without Dr. Karr. The essentiality of Dr. Karr’s guidance underscores the need for improved WC modeling methods and standards.

Second, it was difficult to encode the original serial and randomized algorithms into SBML because SBML does not explicitly represent sequential operations and plain SBML does not support random number generation. We overcame these problems by formalizing submodels as Gillespie algorithm stochastic simulations [38].

Third, in many cases, we had to either enumerate the particle-based state representations used by the original model or approximate the original model. For example, the translation group approximated the original model by lumping all of the elongation reactions for each protein into a single reaction. The replication group used indicator variables to enumerate the particle-based chromosome representation from the original model. However, this enumerated representation requires millions of variables, which is prohibitively expensive, and makes it difficult to represent the exclusion of multiple proteins from binding the same base. Furthermore, it is impractical to edit this verbose enumerated representation.

Fourth, we had to enumerate all of the arrays used by the original model because few SBML simulators support arrays. This created verbose SBML files that are difficult to interpret and maintain and slow to simulate.

In summary, we concluded that it is currently difficult to encode WC models in SBML. WC modeling would be accelerated by expanded software support for model composition, rule-based modeling, arrays, and random number generation.

B. Submodel improvement

We also improved several aspects of the original model. As described above, we replaced the ad hoc stochastic simulation algorithms and rate laws used by the original submodels with the Gillespie algorithm and mass action kinetics. As an example, Fig. 1 and File S1 compare the original and SBML versions of the transcription submodel. We anticipate that these changes will improve the biological accuracy of WC models. The original model used these ad hoc algorithms and rate laws to achieve sufficient performance. Going forward, a high-performance parallel simulator is needed to achieve adequate performance of the Gillespie algorithm.

C. Model integration

The integration group created a scheme for combining the submodels. First, they defined the global species as the union of all submodel species. Second, they standardized the species names to create consistent submodel-global species interfaces.

Third, the group designed a new multi-algorithm simulation strategy to overcome the limitations of the original simulation algorithm. In particular, the group sought to correctly implement the arrow of time by integrating submodels within the same time step based on the same input state. The integration group also sought to develop an algorithm that has a variable time step that can be optimized to balance accuracy and performance. (1) The group considered sequentially integrating the submodels within each time step and setting the time step small enough that only one submodel would advance the cell state within each time step. However, this strategy is prohibitively expensive. (2) The group considered generalizing the original algorithm by dividing each of the global species pools into multiple, independent sub-species pools for each submodel; integrating the submodels in parallel; and merging the sub-species to update the global species. However, it is difficult to apply this strategy to coupled variables such as those that represent the protein occupancy of the chromosome. (3) The group decided to interpret the species changes predicted by each submodel as requests and implement a central controller that accepts or rejects these changes at the end of each time step to update the global species. This strategy is computationally efficient and generalizable.

Lastly, the group explored implementing this algorithm using both the SBML hierarchical model composition package [26] and SED-ML shared variables. The group concluded that both implementations are feasible. The group used iBioSim to test these strategies because iBioSim is one of the only SBML-compatible simulators that supports model composition.

D. Annotation, documentation, and visualization

The documentation group was responsible for annotating the model. The group aimed to define every model element independently from external databases and to provide cross references to databases where possible to help users interpret the model. For example, they used InChI [40] to define small molecule species in terms of structures. They defined DNA, RNA, and protein species as polymers of small molecules. The group wrote scripts to identify cross references for each model entity. However, many entities are not represented by any database. The group contributed the missing metabolite structures to ChEBI [39] and concluded that the biological databases must be expanded to help aggregate data for models.

The group also helped the other groups visualize submodels by providing advice on SBGN and diagramming tools such as SBGN-ED [41], a VANTED add-on for creating, editing and validating SBGN diagrams. The main visualization problem encountered by the group was that WC models require large, intuitive diagrams that are difficult to lay out automatically.

E. Progress and future work

We produced draft SBML and SBGN versions of the submodels. However, significant work remains to combine, identify, and verify the submodels. Using the lessons learned, a subgroup of the participants are continuing to recode the sub-models and integrate the submodels into a single model. We expect that the final model will be more scalable, extensible, and easy to use than the original model. We also plan to build an SBML-compatible multi-algorithm simulator by expanding analysis tools, such as iBioSim and BioUML.

After recoding the model, we plan to identify and validate the new model. We will validate the model in two steps. (1) We will use the experimental data that was used to validate the original model. (2) To more rigorously validate the new model, we will compare the model to newly published single-gene deletion strain growth rates [12] that were not available when the original model was developed.

We aim to publish the SBML-encoded model to BioModels, along with SED-ML tests, SBGN diagrams, and textual documentation. Publication in BioModels will make the model searchable, retrievable, and reusable. We believe this valuable community resource will demonstrate how to describe WC models in standard formats, and it will help other researchers build upon the model.

IV. Toward SBML-, SED-ML-, and SBGN-based standards for WC modeling

The school was the first attempt to encode a WC model using standards. Thus, we were not surprised to learn that the current standards and community software do not easily support WC modeling. Importantly, the school generated ideas for new WC modeling standards and software that will enable researchers to build vastly more comprehensive models.

A. New standards

Two new standards are needed to facilitate WC modeling. A new SBML package should be created to support DNA, RNA, and protein sequence-based reaction patterns. This would enable researchers to easily model sequence-dependent reactions such as the methylation or protein binding of specific DNA motifs. This package would also help integrate genomics and bioinformatics with systems modeling.

SBGN must also be expanded to support (1) hybrid diagrams that contain Process Description, Entity Relationship, and Activity Flow elements and (2) visualizations at multiple levels of granularity.

B. New software tools and databases

Several new software tools and databases are needed to accelerate WC modeling (Table II). A high-performance simulator must be developed. This simulator should be parallelized to enable the simulation of vastly larger models that require more computing and memory than are available on a single machine. This requires research to determine how to concurrently integrate mathematically heterogeneous submodels that share state. The simulator should leverage recent advances in parallel discrete event simulation [42].

Table II.

New standards and software needed to accelerate WC modeling.

Type	Description
Database	Expanded molecular biological databases such as ChEBI [39]
Software	Data curation tools for aggregating the data to build models
Software	Pathway/genome database to organize model training data
Standard	Sequence- and rule-based multi-algorithmic modeling language
Software	Model design tools that generate models from pathway/genome databases
Software	Distributed parameter estimation tools
Software	Frameworks for systematically verifying model
Software	High-performance, parallel, rule-based multi-algorithm simulator
Standard	Extended SBGN standard for hybrid maps containing Process Description, Entity Relationship, and Activity Flow nodes
Software	Visualization software that supports contextual zooming

Open in a new tab

The simulator must also implement the SBML Multistate and Multicomponent Species package [43] to support rule-based modeling. This will enable more succinct model descriptions, making models easier to understand and edit. For example, translation could be described using a single reaction pattern parameterized by mRNA-specific translation initiation rates rather than by enumerating each individual reaction. By separating mathematical descriptions from parameter values, reaction patterns will also clarify the connection between dynamical models and their underlying data. Implementing this package would also enable modelers to efficiently simulate models with combinatorial state spaces, which, in turn, will enable the encoding of more complex models.

Ultimately, to accurately predict phenotypes, WC models must also represent spatially-dependent processes. Currently, researchers are independently pursuing WC and spatial modeling. For example, the M. genitalium model only represents three compartments, and the most advanced spatial models only represent individual pathways. WC and spatial modeling should be combined by adding support for the SBML Spatial Processes package [29] to the new WC simulator.

New model design software must be developed to help researchers quickly build WC models. This software should help researchers systematically build WC models from experimental data organized into pathway/genome database. In turn, this software will help researchers build bigger models.

New data curation tools are needed to aggregate data to build more comprehensive models. The software should automatically aggregate data from public databases, as well as accelerate manual curation from individual publications. This software will also make WC models more reproducible by automatically recording each data source. Natural language processing [44], crowdsourcing [45], and machine learning should also be explored to accelerate data curation.

New pathway/genome database software is needed to organize the data required to build WC models. To clarify the connection between computational models and their underlying experimental data, this software should use semantic annotations to describe how experimental data is used to build computational models.

New model parameter estimation and model verification tools are also needed to identify and verify computationally expensive WC models. To better estimate WC models, we must generalize our model reduction methods and adopt distributed numerical optimization techniques [46]. To more systematically verify WC models, we should adopt formal probabilistic verification techniques from electrical engineering [47].

New algorithms are needed to automatically create intuitive visualizations of large networks and the SBGN viewers should utilize contextual zooming to display diagrams at multiple levels of granularity.

In addition, biological databases, such as ChEBI, must be expanded to help researchers annotate WC models in terms of external entities.

C. Systematic WC modeling pipeline

The new standards and software tools will enable a five step approach to WC model-driven discovery (Fig. 2). (1) Researchers will use data curation tools to aggregate heterogeneous data into pathway/genome databases. These databases will use semantic annotations to describe the connection between models and their underlying data. (2) Researchers will use design tools to build WC models from pathway/genome databases. These tools will export models to software-independent formats such as SBML. (3) Model identification and verification tools will be used to estimate parameters and test models. (4) A multi-algorithm simulator will be used to conduct in silico experiments. (5) Simulation databases and visualization software such as WholeCellSimDB and WholeCellViz will be used to discover new biology by visualizing and analyzing in silico experiments.

WC modeling workflow. Researchers will (1) assemble data into pathway/genome databases, (2) use these databases to construct models, (3) identify and verify models, (4) use multi-algorithm simulators to conduct *in silico* experiments, and (5) analyze these experiments to discover biology.

Together, this pipeline will enable more researchers to more easily build, manage, simulate, and reproduce WC models. These new tools will also enable researchers to build more comprehensive models of more complex eukaryotic cells. Ultimately, this will enable WC modeling to support synthetic biology and personalized medicine.

V. Conclusion

The 2015 Whole-Cell Modeling Summer School trained young scientists in WC modeling and standards by challenging them to recode a WC model in SBML. Additional courses are needed to provide theoretical training in multi-algorithm modeling, model reduction, and parameter estimation, as well as practical training in WC model building.

We made significant strides toward recoding the model in SBML. We also improved the model by replacing the ad hoc algorithms and rate laws used by the original model with the Gillespie algorithm and mass action kinetics. We designed an improved multi-algorithm simulation meta-algorithm. Through validating the model by comparison to quantitative growth rate measurements, we anticipate that we will also discover and add several unknown parallel pathways to the model. We have produced preliminary SBML versions of all of the submodels of the M. genitalium model and we are working to develop a software program to simulate the combined model. We plan to publish the new SBML-encoded model to BioModels.

Most importantly, our community discussions generated clear goals for new WC modeling software and standards. We recommend that researchers develop a new SBML-compatible simulator that supports both model composition and sequence-and rule-based modeling, as well as develop new model design, parameter estimation, model testing, and visualization tools. We also recommend expanding the biological databases to facilitate model building and annotation. Furthermore, we believe that SBGN should be extended to support hybrid diagrams, advanced graph layout, and contextual zooming. Lastly, we recommend evaluating CellML as another potential WC modeling standard.

In summary, we believe that WC modeling will be an important tool for biological science, bioengineering, and medicine. Achieving this potential requires new WC modeling software and standards. In turn, this requires expanding the WC modeling field, including training young researchers.

Acknowledgments

The Rostock and Utah meetings were supported by the Volkswagen Foundation (Grant 88495 to D. W. and F. S.). J. R. K was supported the James S. McDonnell Foundation Postdoctoral Fellowship Award in Studying Complex Systems and National Science Foundation grant 1548123. J. C. was supported by the Australian Research Council Centre of Excellence in Convergent Bio-Nano Science and Technology (project CE140100036).

Contributor Information

Dagmar Waltemath, Institute of Computer Science, University of Rostock, 18051 Rostock, Germany.

Jonathan R. Karr, Department of Genetics & Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.

Frank T. Bergmann, BioQuant, University of Heidelberg, 69120 Heidelberg, Germany

Vijayalakshmi Chelliah, European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Cambridge CB10 1SD, UK.

Michael Hucka, Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA 91125, USA.

Marcus Krantz, Department of Biology, Humboldt University of Berlin, 10115 Berlin, Germany.

Wolfram Liebermeister, Institute of Biochemistry, University Medicine Charité Berlin, 10117 Berlin, Germany.

Pedro Mendes, Manchester Institute of Biotechnology and the School of Computer Science, University of Manchester, Manchester M1 7DN, UK and also with the Center for Quantitative Medicine and the Department of Cell Biology, University of Connecticut Health Center, Farmington, CT 06030, USA.

Chris J. Myers, Department of Electrical and Computer Engineering, University of Utah, Salt Lake City, Utah 84112, USA.

Pinar Pir, Gebze Technical University, Kocaeli 41400, Turkey.

Begum Alaybeyoglu, Department of Chemical Engineering, Boǧaziçi University, Bebek 34342, Turkey.

Naveen K Aranganathan, European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Cambridge CB10 1SD, UK.

Kambiz Baghalian, Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, UK.

Arne T. Bittig, Institute of Computer Science, University of Rostock, 18051 Rostock, Germany

Paulo E. Pinto Burke, Institute of Science and Technology, Federal University of São Paulo, Brazil

Matteo Cantarelli, OpenWorm.

Yin Hoon Chew, Department of Genetics & Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA. Centre for Synthetic and Systems Biology, University of Edinburgh, Edinburgh EH9 3BF, UK.

Rafael S. Costa, Centre of Intelligent Systems-IDMEC, Instituto Superior Técnico, University of Lisbon, 1049-001 Lisboa, Portugal

Joseph Cursons, Systems Biology Laboratory, Melbourne School of Engineering, University of Melbourne, Parkville, VIC 3010, Australia, and also with the ARC Centre of Excellence in Convergent Bio-Nano Science and Technology, Melbourne School of Engineering, University of Melbourne, Parkville, VIC 3010.

Tobias Czauderna, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia.

Arthur P. Goldberg, Department of Genetics & Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA

Harold F. Gómez, Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland

Jens Hahn, Department of Biology, Humboldt University of Berlin, 10115 Berlin, Germany.

Tuure Hameri, Laboratory of Computational Systems Biotechnology (LCSB), Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland.

Daniel F. Hernandez Gardiol, Laboratory of Computational Systems Biotechnology (LCSB), Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland

Denis Kazakiewicz, Center for Statistics, Universiteit Hasselt, Hasselt BE3500, Belgium, and also with the Center for Innovative Research, Medical University of Białystok, Białystok 15-089, Poland.

Ilya Kiselev, Design Technological Institute of Digital Techniques, Siberian Branch of the Russian Academy of Sciences, Novosibirsk 630090, Russia.

Vincent Knight-Schrijver, Babraham Institute, Cambridge CB22 3AT, UK.

Christian Knüpfer, Institut für Informatik, University of Jena, 07743 Jena, Germany.

Matthias König, Institute of Biochemistry, University Medicine Charité Berlin, 10117 Berlin, Germany. also with the Institute for Theoretical Biology, Humboldt-University Berlin, Invalidenstrae 43, 10115 Berlin, Germany.

Daewon Lee, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon 305-701, Republic of Korea.

Audald Lloret-Villas, European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Cambridge CB10 1SD, UK.

Nikita Mandrik, Sobolev Institute of Mathematics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk 630090, Russia.

J. Kyle Medley, Department of Bioengineering, University of Washington, Seattle, WA 98195, USA.

Bertrand Moreau, CoSMo Company, Lyon, France.

Hojjat Naderi-Meshkin, Stem Cell and Regenerative Medicine Research Department, Iranian Academic Center for Education, Culture Research (ACECR), Khorasan Razavi Branch, Mashhad, Iran.

Sucheendra K. Palaniappan, Rennes - Bretagne Atlantique Research Centre, Institute for Research in Computer Science and Automation, 35042 Rennes Cedex, France

Daniel Priego-Espinosa, Instituto de Ciencias Físicas, Universidad Nacional Autónoma de México, México.

Martin Scharm, Institute of Computer Science, University of Rostock, 18051 Rostock, Germany.

Mahesh Sharma, Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Punjab 160062, India.

Kieran Smallbone, Manchester Centre for Integrative Systems Biology, University of Manchester, Manchester M1 7DN, UK.

Natalie J. Stanford, Manchester Centre for Integrative Systems Biology, University of Manchester, Manchester M1 7DN, UK

Je-Hoon Song, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon 305-701, Republic of Korea.

Tom Theile, Institute of Computer Science, University of Rostock, 18051 Rostock, Germany.

Milenko Tokic, Laboratory of Computational Systems Biotechnology (LCSB), Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland. also with the Swiss Institute of Bioinformatics (SIB), CH-1015 Switzerland.

Namrata Tomar, Department of Dermatology, University Medicine, Friedrich-Alexander University of Erlangen-Nürnberg, Erlangen, Germany.

Vasundra Touré, Institute of Computer Science, University of Rostock, 18051 Rostock, Germany.

Jannis Uhlendorf, Department of Biology, Humboldt University of Berlin, 10115 Berlin, Germany.

Thawfeek M Varusai, Department of Systems Biology Ireland, University College Dublin, Belfield, Dublin 4, Ireland.

Leandro H. Watanabe, Department of Electrical and Computer Engineering, University of Utah, Salt Lake City, Utah 84112, USA

Florian Wendland, Institute of Computer Science, University of Rostock, 18051 Rostock, Germany.

Markus Wolfien, Institute of Computer Science, University of Rostock, 18051 Rostock, Germany.

James T. Yurkovich, Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA

Yan Zhu, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, VIC 3052, Australia.

Argyris Zardilis, Centre for Synthetic and Systems Biology, University of Edinburgh, UK.

Anna Zhukova, Institut de Biochimie et Génétique Cellulaires, National Center for Scientific Research, and also with the University of Bordeaux, France, 33077 Bordeaux Cedex, France.

Falk Schreiber, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia and also with the Department of Computer and Information Science, University of Konstanz, 78457 Konstanz, Germany.

References

1.Reed JL, Patel TR, Chen KH, et al. Systems approach to refining genome annotation. Proc Natl Acad Sci U S A. 2006;103(46):17 480–17 484. doi: 10.1073/pnas.0603364103. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Lee JW, Na D, Park JM, et al. Systems metabolic engineering of microorganisms for natural and non-natural chemicals. Nat Chem Biol. 2012;8(6):536–546. doi: 10.1038/nchembio.970. [DOI] [PubMed] [Google Scholar]
3.Lee DS, Burd H, Liu J, et al. Comparative genome-scale metabolic reconstruction and flux balance analysis of multiple Staphylococcus aureus genomes identify novel antimicrobial drug targets. J Bacteriol. 2009;191(12):4015–4024. doi: 10.1128/JB.01743-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Carrera J, Covert MW. Why build whole-cell models? Trends Cell Biol. 2015;25(12):719–722. doi: 10.1016/j.tcb.2015.09.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Macklin DN, Ruggero NA, Covert MW. The future of whole-cell modeling. Curr Opin Biotechnol. 2014;28:111–115. doi: 10.1016/j.copbio.2014.01.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Karr JR, Takahashi K, Funahashi A. The principles of whole-cell modeling. Curr Opin Microbiol. 2015;27:18–24. doi: 10.1016/j.mib.2015.06.004. [DOI] [PubMed] [Google Scholar]
7.Karr JR, Williams AH, Zucker JD, et al. Summary of the DREAM8 Parameter Estimation Challenge: Toward parameter identification for whole-cell models. PLoS Comput Biol. 2015;11(5):e1004096. doi: 10.1371/journal.pcbi.1004096. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Hucka M, Nickerson DP, Bader GD, et al. Promoting coordinated development of community-based information standards for modeling in biology: the COMBINE initiative. Frontiers in Bioengineering and Biotechnology. 2015;3(19) doi: 10.3389/fbioe.2015.00019. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Klipp E, Liebermeister W, Helbig A, et al. Systems biology standards—the community speaks. Nat Biotechnol. 2007;25(4):390–391. doi: 10.1038/nbt0407-390. [DOI] [PubMed] [Google Scholar]
10.Büchel F, Rodriguez N, Swainston N, et al. Path2Models: large-scale generation of computational models from biochemical pathway maps. BMC Syst Biol. 2013;7(1):116. doi: 10.1186/1752-0509-7-116. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Karr JR, Sanghvi JC, Macklin DN, et al. A whole-cell computational model predicts phenotype from genotype. Cell. 2012;150(2):389–401. doi: 10.1016/j.cell.2012.05.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Sanghvi JC, Regot S, Carrasco S, et al. Accelerated discovery via a whole-cell model. Nat Methods. 2013;10(12):1192–1195. doi: 10.1038/nmeth.2724. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Purcell O, Jain B, Karr JR, et al. Towards a whole-cell modeling approach for synthetic biology. Chaos. 2013;23(2):025112. doi: 10.1063/1.4811182. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Kazakiewicz D, Karr JR, Langner K, Plewczynski D. A combined systems and structural modeling approach repositions antibiotics for mycoplasma genitalium. Comput Biol Chem. 2015;59:91–97. doi: 10.1016/j.compbiolchem.2015.07.007. [DOI] [PubMed] [Google Scholar]
15.Karr JR, Sanghvi JC, Macklin DN, et al. Whole-CellKB: model organism databases for comprehensive whole-cell models. Nucleic Acids Res. 2013;41(Database issue):D787–D792. doi: 10.1093/nar/gks1108. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Karr JR, Phillips NC, Covert MW. Whole-CellSimDB: a hybrid relational/hdf database for whole-cell model predictions. Database. 2014;2014:bau095. doi: 10.1093/database/bau095. no. pii. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Lee R, Karr JR, Covert MW. WholeCellViz: data visualization for whole-cell models. BMC Bioinformatics. 2013;14:253. doi: 10.1186/1471-2105-14-253. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Chelliah V, Juty N, Ajmera I, et al. BioModels: ten-year anniversary. Nucleic Acids Res. 2015;43(D1):D542–D548. doi: 10.1093/nar/gku1181. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Hucka M, Finney A, Sauro HM, et al. The Systems Biology Markup Language (SBML): A medium for representation and exchange of biochemical network models. Bioinformatics. 2003;19(4):524–531. doi: 10.1093/bioinformatics/btg015. [DOI] [PubMed] [Google Scholar]
20.Hedley WJ, Nelson MR, Bullivant DP, Nielson PF. A short introduction to CellML. Philos Trans R Soc Lond A. 2001;359:1073–1089. [Google Scholar]
21.Waltemath D, Adams R, Bergmann F, et al. Reproducible computational biology experiments with SED-ML—the Simulation Experiment Description Markup Language. BMC Syst Biol. 2011;5(1):198. doi: 10.1186/1752-0509-5-198. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Le Novère N, Hucka M, Mi H, et al. The Systems Biology Graphical Notation. Nat Biotechnol. 2009;27:735–741. doi: 10.1038/nbt.1558. [DOI] [PubMed] [Google Scholar]
23.Le Novère N, Hucka M, Mi H, et al. The Systems Biology Graphical Notation. Nat Biotechnol. 2009;27(8):735–741. doi: 10.1038/nbt.1558. [DOI] [PubMed] [Google Scholar]
24.Hucka M, Bergmann FT, Hoops S, et al. The Systems Biology Markup Language (SBML): Language specification for Level 3 Version 1 core. Journal of Integrative Bioinformatics. 2015;12(2):266. doi: 10.2390/biecoll-jib-2015-266. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Waltemath D, Bergmann FT, Chaouiya C, et al. Meeting report from the fourth meeting of the Computational Modeling in Biology Network (COMBINE) Stand Genomic Sci. 2014;9(3) doi: 10.1186/s40793-018-0320-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Smith LP, Hucka M, Hoops S, et al. SBML level 3 package: Hierarchical model composition, version 1 release 3. Journal of Integrative Bioinformatics. 2015;12(2):268. doi: 10.2390/biecoll-jib-2015-268. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Moodie SL, Smith LP, Le Novère N, et al. [accessed: 2016-02-26];The distributions package for SBML level 3. 2015 [Online]. Available: http://sourceforge.net/p/sbml/code/HEAD/tree/trunk/specifications/sbml-level-3/version-1/distrib/sbml-level-3-distrib-package-proposal.pdf?format=raw.
28.Olivier BG, Bergmann FT. The Systems Biology Markup Language (SBML) level 3 package: Flux balance constraints. Journal of Integrative Bioinformatics. 2015;12(2):269. doi: 10.2390/biecoll-jib-2015-269. [DOI] [PubMed] [Google Scholar]
29.Schaff JC, Lakshminarayana A, Smith LP, et al. SBML level 3 package: Spatial processes. 2015. [Google Scholar]
30.Funahashi A, Matsuoka Y, Jouraku A, et al. CellDe-signer 3.5: a versatile modeling tool for biochemical networks. Proc IEEE. 2008;96(8):1254–1265. [Google Scholar]
31.Rohn H, Junker A, Hartmann A, et al. VANTED v2: a framework for systems biology applications. BMC Syst Biol. 2012;6:139. doi: 10.1186/1752-0509-6-139. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Smith LP, Bergmann FT, Chandran D, Sauro HM. Antimony: a modular model definition language. Bioinformatics. 2009;25(18):2452–2454. doi: 10.1093/bioinformatics/btp401. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Kolpakov F. BioUML: visual modeling, automated code generation and simulation of biological systems. Proc BGRS. 2006;3:281–285. [Google Scholar]
34.Ebrahim A, Lerman JA, Palsson BO, Hyduke DR. COBRApy: constraints-based reconstruction and analysis for python. BMC Syst Biol. 2013;7(1):74. doi: 10.1186/1752-0509-7-74. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Hoops S, Sahle S, Lee C, et al. COPASI — a COmplex PAthway SImulator. Bioinformatics. 2006;22:3067–3074. doi: 10.1093/bioinformatics/btl485. [DOI] [PubMed] [Google Scholar]
36.Madsen C, Myers CJ, Patterson T, et al. Design and test of genetic circuits using iBioSim. IEEE Des Test Comput. 2012;29(3) [Google Scholar]
37.Somogyi ET, Bouteiller J-M, Glazier JA, et al. libRoadRunner: a high performance SBML simulation and analysis library. Bioinformatics. 2015 doi: 10.1093/bioinformatics/btv363. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Gillespie DT. Exact stochastic simulation of coupled chemical reactions. J Phys Chem. 1977;81(25):2340–2361. [Google Scholar]
39.Hastings J, Owen G, Dekker A, et al. Chebi in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Res. 2016;44(D1):D1214–D1219. doi: 10.1093/nar/gkv1031. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Heller SR, McNaught A, Stein S, et al. Inchi-the worldwide chemical structure identifier standard. J Cheminform. 2013;5(1):7. doi: 10.1186/1758-2946-5-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Czauderna T, Klukas C, Schreiber F. Editing, validating and translating of sbgn maps. Bioinformatics. 2010;26(18):2340–2341. doi: 10.1093/bioinformatics/btq407. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Goldberg AP, Chew YH, Karr JR. Prin Adv Discret Simul. ACM SIGSIM; 2016. Toward scalable whole-cell modeling of human cells. [Google Scholar]
43.Zhang F, Meier-Schellersheim M. [accessed: 2015-05-25];SBML Level 3 Package Specification: Multistate/Multicomponent Species (Version 1, Release 0.1 Draft 369) 2015 [Online]. Available: http://sbml.org/Documents/Specifications/SBML_Level_3/Packages/multi.
44.Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. J Am Med Inform Assoc. 2011;18(5):544–551. doi: 10.1136/amiajnl-2011-000464. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Good BM, Su AI. Crowdsourcing for bioinformatics. Bioinformatics. 2013:btt333. doi: 10.1093/bioinformatics/btt333. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Villaverde AF, Egea JA, Banga JR. A cooperative strategy for parameter estimation in large scale systems biology models. BMC Syst Biol. 2012;6(1):1. doi: 10.1186/1752-0509-6-75. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Kwiatkowska M, Norman G, Parker D. Comput Aided Verification. Springer; 2011. PRISM 4.0: Verification of probabilistic real-time systems; pp. 585–591. [Google Scholar]

[R1] 1.Reed JL, Patel TR, Chen KH, et al. Systems approach to refining genome annotation. Proc Natl Acad Sci U S A. 2006;103(46):17 480–17 484. doi: 10.1073/pnas.0603364103. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Lee JW, Na D, Park JM, et al. Systems metabolic engineering of microorganisms for natural and non-natural chemicals. Nat Chem Biol. 2012;8(6):536–546. doi: 10.1038/nchembio.970. [DOI] [PubMed] [Google Scholar]

[R3] 3.Lee DS, Burd H, Liu J, et al. Comparative genome-scale metabolic reconstruction and flux balance analysis of multiple Staphylococcus aureus genomes identify novel antimicrobial drug targets. J Bacteriol. 2009;191(12):4015–4024. doi: 10.1128/JB.01743-08. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Carrera J, Covert MW. Why build whole-cell models? Trends Cell Biol. 2015;25(12):719–722. doi: 10.1016/j.tcb.2015.09.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Macklin DN, Ruggero NA, Covert MW. The future of whole-cell modeling. Curr Opin Biotechnol. 2014;28:111–115. doi: 10.1016/j.copbio.2014.01.012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Karr JR, Takahashi K, Funahashi A. The principles of whole-cell modeling. Curr Opin Microbiol. 2015;27:18–24. doi: 10.1016/j.mib.2015.06.004. [DOI] [PubMed] [Google Scholar]

[R7] 7.Karr JR, Williams AH, Zucker JD, et al. Summary of the DREAM8 Parameter Estimation Challenge: Toward parameter identification for whole-cell models. PLoS Comput Biol. 2015;11(5):e1004096. doi: 10.1371/journal.pcbi.1004096. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Hucka M, Nickerson DP, Bader GD, et al. Promoting coordinated development of community-based information standards for modeling in biology: the COMBINE initiative. Frontiers in Bioengineering and Biotechnology. 2015;3(19) doi: 10.3389/fbioe.2015.00019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Klipp E, Liebermeister W, Helbig A, et al. Systems biology standards—the community speaks. Nat Biotechnol. 2007;25(4):390–391. doi: 10.1038/nbt0407-390. [DOI] [PubMed] [Google Scholar]

[R10] 10.Büchel F, Rodriguez N, Swainston N, et al. Path2Models: large-scale generation of computational models from biochemical pathway maps. BMC Syst Biol. 2013;7(1):116. doi: 10.1186/1752-0509-7-116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Karr JR, Sanghvi JC, Macklin DN, et al. A whole-cell computational model predicts phenotype from genotype. Cell. 2012;150(2):389–401. doi: 10.1016/j.cell.2012.05.044. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Sanghvi JC, Regot S, Carrasco S, et al. Accelerated discovery via a whole-cell model. Nat Methods. 2013;10(12):1192–1195. doi: 10.1038/nmeth.2724. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Purcell O, Jain B, Karr JR, et al. Towards a whole-cell modeling approach for synthetic biology. Chaos. 2013;23(2):025112. doi: 10.1063/1.4811182. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Kazakiewicz D, Karr JR, Langner K, Plewczynski D. A combined systems and structural modeling approach repositions antibiotics for mycoplasma genitalium. Comput Biol Chem. 2015;59:91–97. doi: 10.1016/j.compbiolchem.2015.07.007. [DOI] [PubMed] [Google Scholar]

[R15] 15.Karr JR, Sanghvi JC, Macklin DN, et al. Whole-CellKB: model organism databases for comprehensive whole-cell models. Nucleic Acids Res. 2013;41(Database issue):D787–D792. doi: 10.1093/nar/gks1108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Karr JR, Phillips NC, Covert MW. Whole-CellSimDB: a hybrid relational/hdf database for whole-cell model predictions. Database. 2014;2014:bau095. doi: 10.1093/database/bau095. no. pii. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Lee R, Karr JR, Covert MW. WholeCellViz: data visualization for whole-cell models. BMC Bioinformatics. 2013;14:253. doi: 10.1186/1471-2105-14-253. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Chelliah V, Juty N, Ajmera I, et al. BioModels: ten-year anniversary. Nucleic Acids Res. 2015;43(D1):D542–D548. doi: 10.1093/nar/gku1181. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Hucka M, Finney A, Sauro HM, et al. The Systems Biology Markup Language (SBML): A medium for representation and exchange of biochemical network models. Bioinformatics. 2003;19(4):524–531. doi: 10.1093/bioinformatics/btg015. [DOI] [PubMed] [Google Scholar]

[R20] 20.Hedley WJ, Nelson MR, Bullivant DP, Nielson PF. A short introduction to CellML. Philos Trans R Soc Lond A. 2001;359:1073–1089. [Google Scholar]

[R21] 21.Waltemath D, Adams R, Bergmann F, et al. Reproducible computational biology experiments with SED-ML—the Simulation Experiment Description Markup Language. BMC Syst Biol. 2011;5(1):198. doi: 10.1186/1752-0509-5-198. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Le Novère N, Hucka M, Mi H, et al. The Systems Biology Graphical Notation. Nat Biotechnol. 2009;27:735–741. doi: 10.1038/nbt.1558. [DOI] [PubMed] [Google Scholar]

[R23] 23.Le Novère N, Hucka M, Mi H, et al. The Systems Biology Graphical Notation. Nat Biotechnol. 2009;27(8):735–741. doi: 10.1038/nbt.1558. [DOI] [PubMed] [Google Scholar]

[R24] 24.Hucka M, Bergmann FT, Hoops S, et al. The Systems Biology Markup Language (SBML): Language specification for Level 3 Version 1 core. Journal of Integrative Bioinformatics. 2015;12(2):266. doi: 10.2390/biecoll-jib-2015-266. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Waltemath D, Bergmann FT, Chaouiya C, et al. Meeting report from the fourth meeting of the Computational Modeling in Biology Network (COMBINE) Stand Genomic Sci. 2014;9(3) doi: 10.1186/s40793-018-0320-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Smith LP, Hucka M, Hoops S, et al. SBML level 3 package: Hierarchical model composition, version 1 release 3. Journal of Integrative Bioinformatics. 2015;12(2):268. doi: 10.2390/biecoll-jib-2015-268. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Moodie SL, Smith LP, Le Novère N, et al. [accessed: 2016-02-26];The distributions package for SBML level 3. 2015 [Online]. Available: http://sourceforge.net/p/sbml/code/HEAD/tree/trunk/specifications/sbml-level-3/version-1/distrib/sbml-level-3-distrib-package-proposal.pdf?format=raw.

[R28] 28.Olivier BG, Bergmann FT. The Systems Biology Markup Language (SBML) level 3 package: Flux balance constraints. Journal of Integrative Bioinformatics. 2015;12(2):269. doi: 10.2390/biecoll-jib-2015-269. [DOI] [PubMed] [Google Scholar]

[R29] 29.Schaff JC, Lakshminarayana A, Smith LP, et al. SBML level 3 package: Spatial processes. 2015. [Google Scholar]

[R30] 30.Funahashi A, Matsuoka Y, Jouraku A, et al. CellDe-signer 3.5: a versatile modeling tool for biochemical networks. Proc IEEE. 2008;96(8):1254–1265. [Google Scholar]

[R31] 31.Rohn H, Junker A, Hartmann A, et al. VANTED v2: a framework for systems biology applications. BMC Syst Biol. 2012;6:139. doi: 10.1186/1752-0509-6-139. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.Smith LP, Bergmann FT, Chandran D, Sauro HM. Antimony: a modular model definition language. Bioinformatics. 2009;25(18):2452–2454. doi: 10.1093/bioinformatics/btp401. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Kolpakov F. BioUML: visual modeling, automated code generation and simulation of biological systems. Proc BGRS. 2006;3:281–285. [Google Scholar]

[R34] 34.Ebrahim A, Lerman JA, Palsson BO, Hyduke DR. COBRApy: constraints-based reconstruction and analysis for python. BMC Syst Biol. 2013;7(1):74. doi: 10.1186/1752-0509-7-74. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] 35.Hoops S, Sahle S, Lee C, et al. COPASI — a COmplex PAthway SImulator. Bioinformatics. 2006;22:3067–3074. doi: 10.1093/bioinformatics/btl485. [DOI] [PubMed] [Google Scholar]

[R36] 36.Madsen C, Myers CJ, Patterson T, et al. Design and test of genetic circuits using iBioSim. IEEE Des Test Comput. 2012;29(3) [Google Scholar]

[R37] 37.Somogyi ET, Bouteiller J-M, Glazier JA, et al. libRoadRunner: a high performance SBML simulation and analysis library. Bioinformatics. 2015 doi: 10.1093/bioinformatics/btv363. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] 38.Gillespie DT. Exact stochastic simulation of coupled chemical reactions. J Phys Chem. 1977;81(25):2340–2361. [Google Scholar]

[R39] 39.Hastings J, Owen G, Dekker A, et al. Chebi in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Res. 2016;44(D1):D1214–D1219. doi: 10.1093/nar/gkv1031. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] 40.Heller SR, McNaught A, Stein S, et al. Inchi-the worldwide chemical structure identifier standard. J Cheminform. 2013;5(1):7. doi: 10.1186/1758-2946-5-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] 41.Czauderna T, Klukas C, Schreiber F. Editing, validating and translating of sbgn maps. Bioinformatics. 2010;26(18):2340–2341. doi: 10.1093/bioinformatics/btq407. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] 42.Goldberg AP, Chew YH, Karr JR. Prin Adv Discret Simul. ACM SIGSIM; 2016. Toward scalable whole-cell modeling of human cells. [Google Scholar]

[R43] 43.Zhang F, Meier-Schellersheim M. [accessed: 2015-05-25];SBML Level 3 Package Specification: Multistate/Multicomponent Species (Version 1, Release 0.1 Draft 369) 2015 [Online]. Available: http://sbml.org/Documents/Specifications/SBML_Level_3/Packages/multi.

[R44] 44.Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. J Am Med Inform Assoc. 2011;18(5):544–551. doi: 10.1136/amiajnl-2011-000464. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] 45.Good BM, Su AI. Crowdsourcing for bioinformatics. Bioinformatics. 2013:btt333. doi: 10.1093/bioinformatics/btt333. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R46] 46.Villaverde AF, Egea JA, Banga JR. A cooperative strategy for parameter estimation in large scale systems biology models. BMC Syst Biol. 2012;6(1):1. doi: 10.1186/1752-0509-6-75. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] 47.Kwiatkowska M, Norman G, Parker D. Comput Aided Verification. Springer; 2011. PRISM 4.0: Verification of probabilistic real-time systems; pp. 585–591. [Google Scholar]

PERMALINK

Toward community standards and software for whole-cell modeling

Dagmar Waltemath

Jonathan R Karr

Frank T Bergmann

Vijayalakshmi Chelliah

Michael Hucka

Marcus Krantz

Wolfram Liebermeister

Pedro Mendes

Chris J Myers

Pinar Pir

Begum Alaybeyoglu

Naveen K Aranganathan

Kambiz Baghalian

Arne T Bittig

Paulo E Pinto Burke

Matteo Cantarelli

Yin Hoon Chew

Rafael S Costa

Joseph Cursons

Tobias Czauderna

Arthur P Goldberg

Harold F Gómez

Jens Hahn

Tuure Hameri

Daniel F Hernandez Gardiol

Denis Kazakiewicz

Ilya Kiselev

Vincent Knight-Schrijver

Christian Knüpfer

Matthias König

Daewon Lee

Audald Lloret-Villas

Nikita Mandrik

J Kyle Medley

Bertrand Moreau

Hojjat Naderi-Meshkin

Sucheendra K Palaniappan

Daniel Priego-Espinosa

Martin Scharm

Mahesh Sharma

Kieran Smallbone

Natalie J Stanford

Je-Hoon Song

Tom Theile

Milenko Tokic

Namrata Tomar

Vasundra Touré

Jannis Uhlendorf

Thawfeek M Varusai

Leandro H Watanabe

Florian Wendland

Markus Wolfien

James T Yurkovich

Yan Zhu

Argyris Zardilis

Anna Zhukova

Falk Schreiber

Roles

Abstract

Objective

Methods

Results

Conclusion

Significance

I. Introduction

Table I.

II. The 2015 Whole-Cell Modeling Summer School

III. Toward an improved SBML-encoded WC model

A. Submodel encoding

Figure 1.

B. Submodel improvement

C. Model integration

D. Annotation, documentation, and visualization

E. Progress and future work

IV. Toward SBML-, SED-ML-, and SBGN-based standards for WC modeling

A. New standards

B. New software tools and databases

Table II.