Abstract
Mathematical and computational models are a key technology in systems biology. Progress in the field depends on the replicability and reproducibility of their properties and behavior. For this, an essential requirement is a set of clear standards for model specification and dissemination. This review covers existing standards, and it highlights the most important areas where further work is required. This includes the specification of agent-based models, an increasingly common modeling approach.
Graphical Abstract
Introduction
Dynamic computational models represent a key enabling technology in systems biology that provides a powerful methodology for encoding the interactions of the different parts of a biological system as they generate its emerging dynamics. As with experimental approaches, model-based research must be reproducible and credible. For maximum impact and efficiency, models must be shareable and transparent, especially as the need for large-scale integrated models grows, such as for the development of medical digital twins [1]. This naturally raises the issue of best practices in model specification and dissemination. This brief critical review describes the current state of affairs, some successes, and the most pressing needs to be addressed.
One of the most prevalent modeling techniques has been systems of ordinary differential equations (ODEs). ODEs have a long tradition as a modeling technique for, e.g., biochemical reaction networks or gene regulatory networks, and have benefitted from sustained and largely successful attempts to develop best practices and standards for their specification. But other model types also have gained popularity, such as discrete models of various types, constraint-based models, and, in particular, agent-based models (ABMs). An additional specification challenge arises when considering multi-scale hybrid models, especially those that combine both mechanistic and data-driven models. Although such models are of increasing importance [2], they are out of the scope of this review.
Model credibility is important in systems biology, whether models are used to guide experiments, or drug development, or patient treatment optimization. This is particularly urgent as many data-driven models are black-box, and the sheer complexity of biological systems makes reverse-engineering from a relatively small set of observations difficult [3]. Replicability and reproducibility, directly connected to credibility, are generally considered challenges of both experimental and computational research [4, 5]. Even for models based on systems of differential equations, reproducibility is far from assured, as was shown in a study based on the BioModels database [6]. The situation for computational simulation models, such as ABMs, is considerably worse. While the problem of reproducibility can be somewhat alleviated by listing all the details of the ABM construction, the sheer complexity of describing these is a significant problem [6, 7]. Here, we focus primarily on the issue of standards for model specification.
Existing Model Specification Standards
Several ongoing standards development efforts are coordinated by the Computational Modeling in Biology Network (COMBINE) [8]. This includes the Systems Biology Markup Language (SBML) [9,10], the Systems Biology Graphical Notation (SBGN) [11–13], and others. Arguably, SBML, under development for two decades, represents the state of the art in this subject, even though it only covers models using systems of ordinary differential equations, with extensions that cover constraint-based models [14], logic models, and rule-based models [13]. It is commonly used in model databases, such as BioModels [15], which includes several thousand models. And it plays an important role in a larger ongoing effort to improve reproducibility of model-based results [16]. In 2013, a community effort resulted in an extension, SBML qual, of SBML to a wide class of time- and state-discrete models [17], that includes logical models, in particular Boolean networks and their generalizations, summarily referred to as qualitative models. It is used in model databases, such as the Cell Collective [18].
Rule-based computational models
Among the modeling techniques not covered by the standards discussed above, and not currently included in initiatives focused on model reproducibility, such as the recently established Center for Reproducible Biomedical Modeling, are rule-based models of multicellular and other systems which are becoming increasingly common. This includes, most prominently, agent-based models (ABMs). Only minimal mathematical expertise is needed to construct them, especially with the availability of low-entry barrier general purpose software platforms, such as Netlogo [19] for agent-based models. Even more complex platforms for virtual tissue models, such as CompuCell3D [20] or PhysiCell [21] are accessible to domain experts without extensive modeling experience. They make it easy to simulate stochastic processes in spatially heterogeneous environments, and require few general assumptions, such as systems being well-mixed in the case of many ODE models. The initial popularity of ABMs in the social sciences, ecology, and epidemiology expanded later to systems biology and biomedicine, including a wide range of applications, such as respiratory diseases [22], tumor growth [23], and liver fibrosis [24]. The cost of such ease of use and broad applicability, coupled with the absence of a mathematically rigorous formalism, is that standards and a formalism for model specification are exceedingly difficult to develop. The complexity of this issue is compounded by the increasing use of ABMs within multiscale models, combining, e.g., ODE-based intracellular models with ABM tissue models (see, e.g., [22]). Thus, there is an urgent need for model specification standards for this modeling technique to ensure reproducibility of results.
Although ABMs, sometimes referred to as individual-based models, are widely used, the state of the art in model specification leaves much to be desired [25]. In essence, ABMs are simulation models encoded in a variety of different computer languages. The most common method of specifying ABMs is text-based, whereas the actual code is not provided in the majority of cases [26, 27]. Even when both are available, it is far from straightforward to verify the concordance between the text-based description and the implementation into computer code, and this is rarely done at the manuscript reviewing stage [28]. Less than half of the publications reporting ABMs explicitly mention what programming language or platform was used, further limiting the ability to replicate results [27], let alone reproduce them. It is difficult to interpret conclusions from an ABM if emergent behavior differs significantly depending on which platform the model has been implemented on, as has been shown to happen using a model from ecology [29]. Lastly, a significant problem with describing models in natural language is that there could be many ways to implement the same procedure described in natural language [28, 30].
The main tool available for ABM model specification is the so-called Overview, Design Concepts, and Details (ODD) protocol [31–33], first introduced in the context of ecological modeling. It is a text-based description (in particular, not necessarily machine-readable) giving an overview of the model, entities, and processes driving the behavior of the individual entities in the model, in a prescribed format that can be effective in specifying model logic, when crafted carefully. However, the ODD protocol has been criticized as not being comprehensive or precise enough for making ABMs truly reproducible, and does not require platform-specific details for the replicability of models [34]. And, as mentioned above, it does not address the important issue of being able to map this description onto the actual simulation code. The development of automated linking between written descriptions in the ODD protocol and software implementing the model to reduce ambiguity has been suggested in the latest update of the ODD protocol, although the authors discuss several cautions in doing so [33].
An effort to specify ABMs in a graphical way is the usage of class diagrams in the Unified Modeling Language (UML), primarily used to describe object-oriented software [35]. Interestingly, in the first update of the ODD protocol for ABMs, it was suggested to not use UMLs to describe ABM structure, to make sure that ODDs are independent of how models are actually implemented [32]. More recently, there have been discussions about representing ABMs in SBML [36], although to our knowledge, there is no current standard way of doing so, limiting the ability of sharing, testing, and replicating ABMs.
Conclusions and future directions
Since early on, it has been recognized that computational models require specification standards to ensure replicability and reproducibility of model simulations. Much progress has been made for models based on systems of ordinary differential equations, owing both to their ubiquity and to their well-defined universal mathematical structure. The Systems Biology Markup Language, together with packages that extend the capability of SBML core, can be considered the gold standard of model description, despite some remaining challenges. Other relatively common model types, such as constraint-based models and logical models are also covered by extensions of SBML. The main outstanding challenge concerns ABMs, for which currently the only general standard available is the ODD protocol.
A concerted effort is needed to transfer the lessons learned from the development of existing standards to ABMs. The need for standards extends from text-based description to the computer code that represents the actual simulation model. Standards have to enable the unambiguous mapping of components in the text to matching components in the code. Ideally, tools should be developed to create machine-readable standards for ABMs. Another important step is the organization of model repositories for ABMs beyond platform-dependent repositories such as Netlogo, similar to BioModels for ODE and constraint-based models, and Cell Collective for discrete models. For this to be practical, standards for model description are also essential. We encourage the systems biology community to take on this challenge.
One possible starting point could be to use the modular structure for simulation models described in [37], which can provide a template for both the text description and the code structure. The key feature of this modular structure is the strict separation of model entities and the data describing the model state. All computational algorithms operate on a global model state that contains all data, all parameters, and all spatial specifications. Algorithms do not interact with each other directly and do not exchange data. Instead, they interact with each other indirectly by modifying overlapping data fields. This decomposition of the simulation into modular pieces, together with an ODD-like description, introduces a clear and discernible structure into the model specification.
Figure 1.
Work still needs to be done in order to make models truly reproducible, although much progress has been made for models based on ODEs. Standardization of ABMs is still in its infancy, limiting studies of the replicability and reproducibility of such models and, as a result, limiting their credibility and extendibility.
Acknowledgements
RL was partially supported by grants NIH 1U01EB024501, NSF CBET-1750183, NIH 1 R01AI135128, and NIH 1R01GM127909. LSV was partially supported by grants NIH 1U01EB024501 and NIH R01MH117114. The authors thank T. Helikar and R. Sheriff for helpful comments on an earlier version of the manuscript.
Works Cited
(•) indicates “important”; (••) indicates “very important.”
- 1.Laubenbacher R, Sluka JP, Glazier JA: Using digital twins in viral infection. Science 2021, 371:1105–1106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (•)2.Alber M, Buganza Tepole A, Cannon WR, De S, Dura-Bernal S, Garikipati K, Karniadakis G, Lytton WW, Perdikaris P, Petzold L, et al. : Integrating machine learning and multiscale modeling—perspectives, challenges, and opportunities in the biological, biomedical, and behavioral sciences. npj Digit Med 2019, 2:115. [DOI] [PMC free article] [PubMed] [Google Scholar]; This perspective argues for the integration of data-driven and physics-based modeling. Known mechanisms can constrain machine-learning and artificial intelligence algorithms. Conversely, machine learning can enhance our capabilities of inferring physical descriptions of phenomena.
- 3.Erdemir A, Mulugeta L, Ku JP, Drach A, Horner M, Morrison TM, Peng GCY, Vadigepalli R, Lytton WW, Myers JG: Credible practice of modeling and simulation in healthcare: ten rules from a multidisciplinary perspective. J Transl Med 2020, 18:369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Baker M: 1,500 scientists lift the lid on reproducibility. Nature 2016, 533:452–454. [DOI] [PubMed] [Google Scholar]
- (•)5.Fitzpatrick BG: Issues in Reproducible Simulation Research. Bull Math Biol 2019, 81:1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]; Part of a collection of articles on reproducibility. It discusses reproducibility in the context of agent-based models, and provides a set of concrete guidelines for their construction and use.
- (••)6.Tiwari K, Kananathan S, Roberts MG, Meyer JP, Sharif Shohan MU, Xavier A, Maire M, Zyoud A, Men J, Ng S, et al. : Reproducibility in systems biology modelling. Mol Syst Biol 2021, 17. [DOI] [PMC free article] [PubMed] [Google Scholar]; An assessment of reproducibility of models in systems biology, using a collection of models from the BioModels database.
- 7.Mendes P: Reproducible Research Using Biomodels. Bull Math Biol 2018, 80:3081–3087. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Schreiber F, Sommer B, Bader GD, Gleeson P, Golebiewski M, Hucka M, Keating SM, König M, Myers C, Nickerson D, et al. : Specifications of Standards in Systems and Synthetic Biology: Status and Developments in 2019. Journal of Integrative Bioinformatics 2019, 16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, and the rest of the SBML Forum:, Arkin AP, Bornstein BJ, Bray D, et al. : The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 2003, 19:524–531. [DOI] [PubMed] [Google Scholar]
- 10.Hucka M, Bergmann FT, Chaouiya C, Dräger A, Hoops S, Keating SM, König M, Novère NL, Myers CJ, Olivier BG, et al. : The Systems Biology Markup Language (SBML): Language Specification for Level 3 Version 2 Core Release 2. Journal of Integrative Bioinformatics 2019, 16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Rougny A, Touré V, Moodie S, Balaur I, Czauderna T, Borlinghaus H, Dogrusoz U, Mazein A, Dräger A, Blinov ML, et al. : Systems Biology Graphical Notation: Process Description language Level 1 Version 2.0. Journal of Integrative Bioinformatics 2019, 16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (••)12.Zhang F, Meier-Schellersheim M: SBML Level 3 package: Multistate, Multicomponent and Multicompartment Species, Version 1, Release 1. J Integr Bioinform 2018, 15:/j/jib.2018.15.issue-1/jib-2017-0077/jib-2017-0077.xml. [DOI] [PMC free article] [PubMed] [Google Scholar]; An update on SBML and the challenges presented by multi-scale models of whole cells and organs, as well as new data types such as single cell measurements and live imaging.
- 13.Keating SM, Waltemath D, König M, Zhang F, Dräger A, Chaouiya C, Bergmann FT, Finney A, Gillespie CS, Helikar T, et al. : SBML Level 3: an extensible format for the exchange and reuse of biological models. Mol Syst Biol 2020, 16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Olivier BG, Bergmann FT: SBML Level 3 Package: Flux Balance Constraints version 2. Journal of Integrative Bioinformatics 2018, 15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (•)15.Malik-Sheriff RS, Glont M, Nguyen TVN, Tiwari K, Roberts MG, Xavier A, Vu MT, Men J, Maire M, Kananathan S, et al. : BioModels-15 years of sharing computational models in life science. Nucleic Acids Res 2020, 48:D407–D415. [DOI] [PMC free article] [PubMed] [Google Scholar]; A detailed description of the BioModels data base and its features.
- 16.Papin JA, Mac Gabhann F, Sauro HM, Nickerson D, Rampadarath A: Improving reproducibility in computational biology research. PLoS Comput Biol 2020, 16:e1007881. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Chaouiya C, Bérenguier D, Keating SM, Naldi A, van Iersel MP, Rodriguez N, Dräger A, Büchel F, Cokelaer T, Kowal B, et al. : SBML qualitative models: a model representation format and infrastructure to foster interactions between qualitative modelling formalisms and tools. BMC Syst Biol 2013, 7:135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Helikar T, Kowal B, McClenathan S, Bruckner M, Rowley T, Madrahimov A, Wicks B, Shrestha M, Limbu K, Rogers JA: The Cell Collective: Toward an open and collaborative approach to systems biology. BMC Syst Biol 2012, 6:96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Wilensky U, Rand W: An introduction to agent-based modeling: modeling natural, social, and engineered complex systems with NetLogo. The MIT Press; 2015. [Google Scholar]
- 20.Swat MH, Thomas GL, Belmonte JM, Shirinifard A, Hmeljak D, Glazier JA: Multi-scale modeling of tissues using CompuCell3D. Methods Cell Biol 2012, 110:325–366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Ghaffarizadeh A, Heiland R, Friedman SH, Mumenthaler SM, Macklin P: PhysiCell: An open source physics-based cell simulator for 3-D multicellular systems. PLOS Computational Biology 2018, 14:e1005991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Kirschner D, Pienaar E, Marino S, Linderman JJ: A review of computational and mathematical modeling contributions to our understanding of Mycobacterium tuberculosis within-host infection and treatment. Curr Opin Syst Biol 2017, 3:170–185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Norton K-A, Wallace T, Pandey NB, Popel AS: An agent-based model of triple-negative breast cancer: the interplay between chemokine receptor CCR5 expression, cancer stem cells, and hypoxia. BMC Syst Biol 2017, 11:68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Dutta-Moscato J, Solovyev A, Mi Q, Nishikawa T, Soto-Gutierrez A, Fox IJ, Vodovotz Y: A Multiscale Agent-Based in silico Model of Liver Fibrosis Progression. Front Bioeng Biotechnol 2014, 2:18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Hauke J, Achter S, Meyer M: Theory Development Via Replicated Simulations and the Added Value of Standards. JASSS 2020, 23:12. [Google Scholar]
- 26.Janssen MA: The Practice of Archiving Model Code of Agent-Based Models. JASSS 2017, 20:2. [Google Scholar]
- 27.Janssen MA, Pritchard C, Lee A: On code sharing and model documentation of published individual and agent-based models. Environmental Modelling & Software 2020, 134:104873. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Chanda SS, Miller KD: Replicating agent-based models: Revisiting March’s exploration–exploitation study. Strategic Organization 2019, 17:425–449. [Google Scholar]
- (••)29.Donkin E, Dennis P, Ustalakov A, Warren J, Clare A: Replicating complex agent based models, a formidable task. Environmental Modelling & Software 2017, 92:142–151. [Google Scholar]; A replication study showing that the same agent-based model code can provide substantially different model behavior when executed on different widely available modeling platforms. This highlights the challenges of making agent-based models reproducible.
- 30.Wilensky U, Rand W: Making models match: Replicating an agent-based model. J Artif Soc Soc Simul 2007, 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Grimm V, Berger U, Bastiansen F, Eliassen S, Ginot V, Giske J, Goss-Custard J, Grand T, Heinz SK, Huse G, et al. : A standard protocol for describing individual-based and agent-based models. Ecological Modelling 2006, 198:115–126. [Google Scholar]
- 32.Grimm V, Berger U, DeAngelis DL, Polhill JG, Giske J, Railsback SF: The ODD protocol: A review and first update. Ecological Modelling 2010, 221:2760–2768. [Google Scholar]
- (••)33.Grimm V, Railsback SF, Vincenot CE, Berger U, Gallagher C, DeAngelis DL, Edmonds B, Ge J, Giske J, Groeneveld J, et al. : The ODD Protocol for Describing Agent-Based and Other Simulation Models: A Second Update to Improve Clarity, Replication, and Structural Realism. JASSS 2020, 23:7.33204215 [Google Scholar]; An improved version of the ODD protocol for the specification of agent-based models.
- 34.Amouroux E, Gaudou B, Desvaux S, Drogoul A: O.D.D.: A Promising but Incomplete Formalism for Individual-Based Model Specification. In 2010 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF). . IEEE; 2010:1–4. [Google Scholar]
- 35.Bersini H: UML for ABM. JASSS 2012, 15:9. [Google Scholar]
- 36.Watanabe L, Barhak J, Myers C: Toward reproducible disease models using the Systems Biology Markup Language. SIMULATION 2019, 95:895–930. [Google Scholar]
- 37.Masison J, Beezley J, Mei Y, Ribeiro H, Knapp AC, Sordo Vieira L, Adhikari B, Scindia Y, Grauer M, Helba B, et al. : A modular computational framework for medical digital twins. Proc Natl Acad Sci U S A 2021, 118:e2024287118. [DOI] [PMC free article] [PubMed] [Google Scholar]