Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

ArXiv logoLink to ArXiv
[Preprint]. 2025 Feb 21:arXiv:2502.15597v1. [Version 1]

From FAIR to CURE: Guidelines for Computational Models of Biological Systems

Herbert M Sauro 1,4,*, Eran Agmon 10, Michael L Blinov 10, John H Gennari 23, Joe Hellerstein 4, Adel Heydarabadipour 1, Peter Hunter 14, Bartholomew E Jardine 1, Elebeoba May 31, David P Nickerson 14, Lucian P Smith 1, Gary D Bader 24, Frank Bergmann 34, Patrick M Boyle 1,3,4,5, Andreas Dräger 18,19,20, James R Faeder 28, Song Feng 13, Juliana Freire 22, Fabian Fröhlich 29, James A Glazier 2, Thomas E Gorochowski 6, Tomas Helikar 38, Stefan Hoops 35, Princess Imoukhuede 1, Sarah M Keating 33, Matthias Konig 32, Reinhard Laubenbacher 30, Leslie M Loew 10, Carlos F Lopez 17, William W Lytton 7,8, Andrew McCulloch 39, Pedro Mendes 10, Chris J Myers 27, Jerry G Myers 9, Lealem Mulugeta 25,26, Anna Niarakis 15,16, David D van Niekerk 31, Brett G Olivier 11, Alexander A Patrie 10, Ellen M Quardokus 2, Nicole Radde 40, Johann M Rohwer 37, Sven Sahle 36, James C Schaff 10, TJ Sego 30, Janis Shin 1, Jacky L Snoep 37, Rajanikanth Vadigepalli 12, H Steve Wiley 13, Dagmar Waltemath 41, Ion Moraru 10
PMCID: PMC11875277  PMID: 40034129

Abstract

Guidelines for managing scientific data have been established under the FAIR principles requiring that data be Findable, Accessible, Interoperable, and Reusable. In many scientific disciplines, especially computational biology, both data and models are key to progress. For this reason, and recognizing that such models are a very special type of “data”, we argue that computational models, especially mechanistic models prevalent in medicine, physiology and systems biology, deserve a complementary set of guidelines. We propose the CURE principles, emphasizing that models should be Credible, Understandable, Reproducible, and Extensible. We delve into each principle, discussing verification, validation, and uncertainty quantification for model credibility; the clarity of model descriptions and annotations for understandability; adherence to standards and open science practices for reproducibility; and the use of open standards and modular code for extensibility and reuse. We outline recommended and baseline requirements for each aspect of CURE, aiming to enhance the impact and trustworthiness of computational models, particularly in biomedical applications where credibility is paramount. Our perspective underscores the need for a more disciplined approach to modeling, aligning with emerging trends such as Digital Twins and emphasizing the importance of data and modeling standards for interoperability and reuse. Finally, we emphasize that given the non-trivial effort required to implement the guidelines, the community moves to automate as many of the guidelines as possible.

1. Introduction

Wilkinson et al. in 2016 [1] made a good case (dare we say a fair case) for establishing guidelines for the management of scientific data. They arrived at four guiding principles enshrined in the acronym FAIR, namely that data be Findable, Accessible, Interoperable, and Reusable. With the rapid growth of computational modeling, especially the development of mechanistic models of physiological and cellular systems, the question arises of how these principles can be extended so that they can succinctly describe best practices for model management (e.g., model development, model selection, and model interpretation). In this perspective, we introduce a set of complementary guidelines to FAIR that address the specific needs for mechanistic models. We identify four principles: Credibility, Understandability, Reproducibility, and Extensibility. We refer to these as the CURE principles.

We focus on mechanistic models, thereby excluding the growing body of machine-learned (ML) and AI models that are based solely on data. The machine learning community has appropriately encouraged the use of FAIR principles when publishing ML models [2], with an emphasis on ensuring that data are accessible to those who wish to repeat the study. However, the reproducibility of those models is a separate topic that has its own special concerns (e.g., the selection of training, validation, and test data, as well as the choice of hyperparameters). We note in passing that although ML models rarely consider mechanisms, there are situations in which mechanistic models make use of machine learning approaches, such as in the context of parameter estimation or physically informed neural networks (PINNs) [3].

Although the focus of this proposal is on models from the systems biology community, the guidelines and sentiments we describe are broadly applicable to other modeling domains.

2. Why have Guidelines?

Models are indispensable in many science and engineering disciplines. Examples include circuit simulation in electrical engineering, models of fluid flow in mechanical engineering, and weather modeling in atmospheric science. In some cases, modeling has progressed to the development of digital twins [4, 5], in which a simulation model is designed to replicate and interact with the physical system, or virtual populations, which can be used for applications such as clinical trial design [6]. Biological modeling has not yet risen to that same level of sophistication, with the possible exception of areas such as protein folding [7] and molecular dynamics [8] where physics and chemistry play a more important role. Even so, biological modeling is a rapidly evolving field. As the field grows, guidelines in the spirit of FAIR will help modelers create more impactful and credible models. We believe these guidelines will be of importance for models that ultimately reach the clinic and particularly for the growing interest in developing biomedical Digital Twins. The Working Group ‘Building Immune Digital Twins’ tries to address these questions, having as a focus the human immune system and its responses in various pathological contexts [9, 10].

3. Existing Guidelines

Several groups have proposed guidelines to improve best practices in creating biological models over the last 20 years. Of particular note is the creation of standardised languages for biological models such as Systems Biology Markup Language (SBML) [11], CellML [12] and NeuroML [13]. These are machine-readable formats that are an explicit representation of the model. By explicit representation, we mean that the model representation only includes elements that are essential for modeling; it does not include implementation details related to simulation. For example, the essential characteristics of a mechanistic model of a well-mixed biochemical system include chemical species, reactions, and rate laws. It does not include software details such as file input/output and control logic for numerical solvers. An explicit representation is independent of its implementation in software.

The choice of an explicit representation for models is driven by the requirements of the communities that develop and use the models. SBML focuses on biochemical models where the representation is in terms of the biological processes. CellML focuses on a mathematical representation of models as differential-algebraic equations. NeuroML focuses on representations of neural systems. These representations have become popular among modelers and software developers. For example, all genome-scale models [14] are represented using SBML, and thousands of kinetic models are now stored on publicly accessible model repositories using these formats [15, 16]. Standards such as SBML avoid the use of potentially confusing and unreusable ad hoc models, allowing models to persist in a reproducible form over long periods of time [17, 18]. Many authors, however, still publish models in executable formats such as MATLAB, Python, etc., which can pose problems for reproducibility and reuse, particularly when poorly documented [19, 20]. The logical modeling community, which uses SBML Qualitative format (SBML-Qual) to encode logical and Boolean models, made efforts to define a roadmap toward the annotation and curation of logical models (aka the CALM initiative), including milestones for best practices and recommended standard requirements [21].

To promote data sharing and reuse, the FAIR principles recommend a data dictionary that specifies data types and semantics for data items [1]. An analogous requirement exists for models. For example, consider an SBML model with the reaction A → B. Annotations are used to define A, B, and provide information about the reaction (e.g., the organism, cell type, and organelle in which the reaction takes place). Annotation can provide additional ontological and reference information about a model, including its submodels, processes, assumptions, and provenance. Genome-scale models are heavily annotated with process metadata [22, 23], and the curators at BioModels [24] regularly add annotations to curated models. As part of these efforts, the systems biology and physiology community developed the MIRIAM standard, which describes the “Minimum Information Required In the Annotation of Models” [25]. MIRIAM applies to structured information, such as SBML or CellML, where annotation information can be included in a machine-readable manner. Such information can be of great utility for searching, merging, or disassembling a model into its component parts [26].

4. Mechanistic Models

The focus of this perspective is on mechanistic models. We define a mechanistic model as: a representation of a biological system that is described in terms of the constituent physical parts and processes that occur between the parts. For example, in a mechanistic model of a cell signaling pathway, we would specify the various proteins and their phosphorylation states and the transformations between these states via enzymatic processes involving kinases and phosphatases.

Often, a mechanistic model is transformed into a computational model by invoking physical laws, such as the conservation of mass and chemical kinetic laws that govern individual transformations. In physiology and systems biology, models are commonly represented as a system of ordinary differential equations [27, 28], but other representations are also used, such as systems of Boolean functions, graphs, stochastic systems, and constraint-based models [29]. Such models are often shared via model repositories such as BioModels [24], JWS Online [30], or BiGG [15] and KBase [16] for constraint-based models.

Models can also be shared using raw executable formats such as Python or MATLAB that describe differential equations. Such ‘raw’ model code is typically provided as supplementary files to a published paper or stored in code repositories such as GitHub or specialized repositories such as ModelDB [31]. In some cases, the model may be described as part of the main text of the published article [32] or be absent altogether. This obviously makes the reproducibility of such work much more complex and sometimes impossible, given the frequency of typological errors in printing mathematical equations or code fragments. Models-as-programs paradigms such as those followed by PySB and others encourage the use of best practices in Python coding and documentation through tools such as Sphinx, but these approaches rely on developer effort to document the code and model at an appropriate level of detail [33, 34]. They also tie a model to a specific language implementation, making reuse difficult and backward compatibility a problem.

It may be difficult to abstract the underlying physiological processes of some complex systems purely in explicit form, though there have been few efforts to attempt this.

Multi-scale modeling tends to give rise to such challenging scenarios. The context of computational modeling of electrophysiological phenomena in the heart provides an illustrative example [35]. Ordinary differential equations describing cell-scale events like ion channel gating and the generation of action potentials can be encoded using CellML or as a biological description in SBML. However, representing the propagation of excitatory wavefronts requires spatial discretization of the governing partial differential equations via finite element analysis; explicit formats for such applications exist [36, 37] but have not seen widespread use. One approach to improving interoperability and reproducibility is to create tools for importing model components written in common data formats. For instance, the openCARP simulation environment [35] includes a CellML “translator” that facilitates on-the-fly incorporation of cell-scale representations of different types of cardiac electrophyiology (e.g., cardiac region, species, or disease-specific action potentials) within the fabric of the multi-scale simulation ecosystem.

Mechanistic models of biological systems often contain many parameters whose values are unknown (e.g., kinetic constants and diffusion rates). A popular approach to estimating these parameters is model fitting or calibration [28, 3840]. This is where model parameters are adjusted, often with an optimization algorithm, so that the output of the model matches relevant experimental data. Because models tend to have many parameters, the problem is often, if not always, underdetermined. This means that the set of fitted parameters in a model is not unique. In some instances, one or more parameters may be non-identifiable, meaning no single value can be assigned to the parameter [41]. Solutions to this problem can involve simplifying the model to reduce the number of parameters and/or eliminate non-identifiable parameters; another approach is to collect additional experimental data. More recent methods have focused on Bayesian techniques to assess the uncertainty in parameter estimates [42, 43].

As a side note, we contrast mechanistic models with the great variety of neural network models that can provide accurate predictions for complex problems [44]. However, the resulting “black box” models are almost always challenging to understand, a situation referred to as “intellectual debt” [45]. An example is the prediction of COVID-19 from chest x-rays [46]. A close examination of the model revealed that it ignored the medical features in the x-rays and instead relied on the coding of the hospital at which the imaging was done. This turned out to be a surrogate variable for the patient population since one hospital had more COVID patients than the other. We cite this as an example of a model with accurate predictions that was not credible.

5. The CURE Guidelines

As with FAIR, we define four specific guidelines to improve best practices in developing mechanistic computational models of biological systems. There is no specific order in the guidelines but the acronym CURE seemed appropriate for the topic in question.

graphic file with name nihpp-2502.15597v1-f0003.jpg

CURE1 covers four key ideas, Credibility, Understandability, Reproducibility, and Extensibility, which we describe in the following sections. These guidelines are meant, where possible, to apply to both machine-readable formats, such as SBML, as well as models distributed in executable code such as MATLAB or Python. They also align with previous community efforts to address barriers in comprehensiveness, accessibility, reusability, interoperability, and reproducibility of computational models in systems biology [47].

5.1. Credible

We use credibility to mean a perceived measure of believability [48, 49]. A credible model makes trustworthy and actionable predictions within the range of conditions it was intended to simulate. Prior work on model credibility dates back to 1979 [50], when the Society for Modeling & Simulation International (SCS) described many concepts associated with model credibility. More recently, several groups within the biological modeling community have discussed model credibility, most notably the ten rules developed by CPMS [51], devised by the Committee on Credible Practice of Modeling & Simulation in Healthcare (CMPS) in the US, and in Europe Musuamba et al [52] who describe in some detail criteria and concepts that are important to assess the model credibility. Of note, the CPMS working group considered credibility as a descriptor of the practice of modeling and simulation rather than that of a model. Accordingly, the assessment by the ten CPMS rules are geared towards evaluating the practices followed in the modeling efforts [53]. Two core concepts related to model credibility are verification and validation [54].

Verification is “the process of determining that a model implementation accurately represents the developer’s conceptual description of the model and the solution to the model” [54]. Similar definitions can be found in many other documents [55]. In practice, verification means assessing the correctness of: (i) the model representation (e.g., detecting typographical errors); (ii) the numerical algorithms used to simulate the model (e.g., pseudo random numbers have the correct distributions); and (iii) the model implementation in software. Models implemented directly in programming languages such as Python may not have (i), and models implemented using standards such as SBML have few concerns about (iii).

When using standards such as SBML, verification involves ensuring that different software applications interpret the description of the model in the same way [56] and that the software has passed the SBML test suite [57]. It is worth noting the close correspondence of verification to software testing, including unit and system tests, and documentation [58]. Since models are almost always implemented in software, this should come as no surprise. Static tests can assess whether the model is correct, for example, that the biophysical laws have been entered correctly or that mass balance is not violated. Dynamic tests can offer more subtle checks to the correctness of a model. For example, a model whose variables are concentrations should not be able to reach negative values.

Validation is “the process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended uses of the model” [54]. Of significant importance is the phrase “intended uses”. All models have a limited scope (including AI/ML models). That is, they can only make useful predictions within their intended purpose and calibration. This is particularly important for models that might be used in a clinical setting where misuse of a model could have dire consequences. It is important, therefore, that there be a clear statement of the purpose of the model as well as the conditions under which the model is applicable. Validation involves comparing experimental data to predictions made by the model. In practice, given the nature of scientific models, not all model predictions can be validated. Validation is more a measure of confidence in a model’s credibility to match reality than an absolute statement of truth.

As noted previously, mechanistic models often have parameters whose values are determined through parameter fitting The credibility of the parameter estimates can impact the credibility of the model and can be enhanced in two ways, either by cross-validation or through the use of competing models. During cross-validation, the experimental dataset is split into a training and a test set. The test set is used to check whether the calibrated model can recapitulate the test set. If the model can recover the test set then there is greater confidence in the model. If the model fails to recover the test set, then the model is inadequate and needs to be reexamined. Likewise, in the context of multi-scale simulations, model credibility can be assessed by experimentally documenting the system-wide response to at least two perturbations (e.g., stimulating the heart from distinct sites [59]). Then, a model parameterized using exclusively data based on the first set of measurements can be convincingly validated by demonstrating its ability to reproduce the second set (despite the model never having seen those data during calibration).

For parameter-free logic-based models in biology, the credibility lies in the causality of the statements used to build the logical rules and functions [60], as well as the binarized interpretation and use of small- and larger-scale experimental data [61].

This leads to the question of model selection [62]. That is, given the limited amount of experimental data, a given model will not necessarily be unique, and other models could equally likely to explain the data. The literature on model selection is extensive, but certain popular tests have emerged, most notably the Akaike information criterion (AIC) [63] and the Bayes Information Criterion (BIC) [64]. The AIC is an approximation, based on maximum likelihood, to the Kullback-Leibler divergence [65], which quantifies the difference between the model and full reality. The AIC test considers both the quality of a fit and the number of parameters in the model. For example, given two models that fit the data equally well within experimental error, the model with fewer parameters is the preferred. Any number of plausible models can be compared in this way since the computation is relatively straightforward. Kirk et al. Kirk et al. provide an excellent review of this topic [62]. Model credibility can be enhanced if, given the uncertainty in the underlying biology, a number of models are proposed, with a measure such as AIC, used to indicate their relative credibility. The BIC is based on the relationship to the Bayes Factor, itself the ratio of the likelihood of two hypotheses and used to compare the marginal likelihoods between two models.

Uncertainty Quantification.

In recent years, there has been a growing interest in the biomedical community to use uncertainty quantification (UQ) as part of a credibility assessment [66]. UQ has been well established in other scientific domains for many decades [67] and involves calculating how uncertainty in the experimental data and model parameters contributes to uncertainty in the model outputs. A model that generates highly uncertain outputs is seen as less credible. Verification, validation, and UQ have been collectively referred to by many practitioners using the acronym VVUQ [55]. A recent article by Colebank et al., reviews elements of this topic ??.

Another criterion that can enhance a model’s credibility is the provenance of the data used to build the model. Where did the data come from, and was it modified in some way? Table 1 gives a summary of some of the main criteria that can be used to access credibility. However, not all the criteria are equally weighted. Validation and verification are often the most important in this regard. The first four rules in the 10 rules devised by the Committee on Credible Practice of Modeling & Simulation emphasize all these aspects [51].

Table 1:

A range of criteria that can establish the credibility of a model.

Attribute Criteria

Validation How well do the model outputs match reality?
Verification Has the model been constructed without error; are the simulation algorithms correctly encoded and operating without error?
Uncertainty Have the effects of uncertainty in the model outputs been reported?
Provenance Can the data that was used to calibrate or validate the model be traced to its original source?
Annotation Have the inputs and output of the model been well defined?
Assumptions Have the assumptions used in the model been made explicit?
Purpose Has the purpose of the model been adequately described?
Scope Has the scope, that is, the space within which the model can be used, been specified?
Unbiased Was the model calibrated on unbiased data? This depends on the scope of the model, but if a model is to be used across a diverse population, then clearly, the calibration data needs to be diverse.

In many cases, modelers may opt not to consider some or all of these criteria in their work, primarily due to the burden of having to do the checks. We, therefore, recommend automating as many of these criteria as possible. For example, verification of SBML-based models can be achieved using the BioSimulation resource [68]. Validation tests could be provided in a standard format as part of the software modeling code just as software engineers often provide validation tests as part of their distributions [69]. More difficult is including information about data provenance and model assumptions. However, the use of model standards such as SBML or CellML do allow models to be annotated with this information. The same applies to indicating the scope and purpose of a model. When models have an explicit representation (e.g., SBML) as opposed to just a software implementation, analysis tools can do deep dives into the model to examine its biophysical assumptions. Also, more in-depth verification can be done on the explicit model representation, such as detecting errors in the formulation of biophysical laws [70]. MEMOTE [71] is a successful example of automated software that can do deep dives into genome-scale models to assess the quality of the model.

Credibility is widely used in software development through the commenting of code, using version control for provenance, and unit and system testing for validation [58]. Verification is achieved through in-depth checking of the software compilers and runtime systems.

5.2. Understandable

One of the aims of the scientific method is to gain an understanding of how the world works by proposing models and theories to be tested. We understand theories (to paraphrase [72]) to be bodies of knowledge that are broad in scope. Chemical reaction theory or the central dogma in biology are examples of broad scope. In contrast, models are instantiations of theories and, as a result, are narrower in scope and often represent a particular biological process of interest. This includes computational models of metabolism, protein signaling networks, etc. Both models and theories are the most important outcome of the scientific method. They provide a way to rationalize a set of observations and make predictions about the future state of the system. However, the act of “understanding” [73] a model or theory is not an easy concept to define and may encompass a number of different aspects. In particular, how might one quantify “understanding”? Philosophically, de Regt and Dieks [73] define understanding as a given phenomenon if there exists a theory that describes the phenomenon. However, such definitions are hard to quantify. Instead, we will focus on measures that can be used to provide some level of confidence in how understandable a model is.

We note in passing that understanding modern AI [74, 75] and machine learning algorithms can be very challenging, if not impossible to understand. It may not even be the goal of such approaches where the underlying physical reality is entirely superfluous to the objective. As AI approaches become increasingly accurate at making predictions [76], the necessity of understanding how systems work may become entirely unnecessary. There are a number of counterarguments to this view. One is that human beings have an innate desire to understand the world, which is difficult to suppress. In addition, understanding a model can lead to its improvement, which is often transferable knowledge that can be applied to other problems. In a clinical situation, there is no guarantee that training has been sufficient to provide reliable diagnosis and treatment. If the system makes a mistake, we cannot easily determine why. In software engineering, writing understandable code is crucial to minimize maintenance costs, reduce bugs, and make the software more reliable and extensible [7779]. The obvious counter argument is that the underlying mechanism of many clinical treatments are either poorly understood nor not understood at all and yet they are useful and benefit a great many patients.

Biological systems, even small ones are challenging to understand due to the nonlinear interactions among the components. The problem gets even more acute as the systems we study get larger. Biological systems often interact in complicated and nonlinear ways that result in emergent behaviors [80]. For example, there is no amount of understanding of the components of DNA – purines and pyrimidines, or at a lower-scale carbon and hydrogen atoms – that would lead us to predict its complex structure or understand its biological role; the role emerges from the way that the components are put together.

How can we measure understanding? One way is to divide a model’s components and attributes into levels of perceived importance. For example, we could understand different aspects of a model such as knowing a model’s purpose, its components, the biophysical laws that describe how the components interact, its inputs, outputs, assumptions, and limitations. Figure 1, depicts a pyramid that organizes such a hierarchy of ‘understanding’. At the base of the pyramid is the most rudimentary understanding; successive layers indicate increased levels of understanding. At the most rudimentary level (1), we want to know the purpose and objectives of the model as well as its inputs and outputs. Subsequent levels include the system components being modeled (2); the interactions between components that are modeled (3); a mathematical description of these interactions (4); a way to evaluate the mathematical model (e.g., solve a system of differential equations) (5); and finally, if possible, a general theory that explains the behavior of the model (6).

Fig. 1:

Fig. 1:

Quantifying understanding through a hierarchy of criteria.

Technology is already available to assist in identifying model components (Level 2) and interactions (Level 3), as well as adding metadata to a model (Level 1). Such information can be provided through model annotations [81]. The mathematical model (Level 4) also includes the model’s assumptions, which can be added as annotations using the SBO ontology [82]. For genome-scale metabolic models, the scientific community developed a list of detailed recommendations to annotate such models at Levels 1 to 4 following an extensive debate at their 2018 annual conference on constraint-based reconstruction and analysis (COBRA) [83]. Scientific models do not have to be mathematical, but for our purposes, most models are, and once a model is defined mathematically, it is often possible to convert the model into an executable form so that simulations can be carried out to generate predictions. Simulation information can be annotated using ontologies such as KiSAO [84]. The pinnacle of understanding is Level 6, which is the formulation of a theory that explains the behavior seen in the model.

Level 6 is the most difficult to quantify and involves explaining in terms of both fundamental and higher-order concepts how a given behavior emerges. Making this more difficult is that many biological systems show emergent behavior [80], where we observe behavior that is not found in any individual component of the system. Even simple systems can reach this threshold. For example, a system of only two components and some non-linearity can show sustained energy dependent oscillations [28, 85]. Understanding the individual components is not sufficient to explain the origin of the oscillations [86] but requires a theory to help understand how, as a system, we observe oscillations. In this particular case, we need additional concepts, such as hysteresis and negative feedback, to understand the origin of the oscillations. For very complex systems, there may be multiple levels of abstraction that describe many levels of emergent behavior. Such abstractions are commonly used in electrical engineering and computer science. This is what makes it possible to engineer these highly complex systems. Reverse engineering abstraction layers in biology are, however, difficult [87, 88] due to the fact we are dealing with evolved systems that are not always as well structured as our own engineered systems. Unfortunately, no ontology exists to adequately annotate how a system operates. TEDDY is closest to an ontology [82], which can be used to describe dynamics but does not, as yet, have the capacity to describe a theory of operation.

5.3. Reproducible

Reproducibility is a cornerstone of the scientific method, and over the last 10–15 years, much discussion has been devoted to this topic [89, 90]; mostly about the lack of reproducibility of many scientific results. Often, we think of wet lab experiments in biology as being difficult to reproducible because they are multifaceted and inherently variable. However, it has also been discovered that computational experiments are often not reproducible [19]. This is surprising given that computation involves well-defined and often deterministic procedures. We will not revisit the many issues and recommendations concerning the reproducibility of computational experiments [17, 18, 49, 91, 92] but one thing that has become clear is that community standards such as SBML have significantly improved the reproducibility of computational models by providing a machine-readable representation in a standard format [19, 93]. Moreover, recent evidence supports the opinion that reproducible models are more cited and more likely to be reused in subsequent studies [94]. Even if reproducibility is not a high priority for its own sake, verification, validation, and reuse of models require a model to be reproducible [95]. In recent years we have seen the emergence of software tools and standards that offer support to assist modelers in creating reproducible models [70, 9698]. One could even go as far as to say that in systems biology at least, the problem of reproducibility is solved [91].

5.4. Extensible and reusable

Science builds on past efforts, standing on the proverbial shoulders of giants. This is no different when building computational models where past computational models can be enhanced, reused, and further validation applied. Unfortunately, many published models cannot be easily reused [51]. This is because many models were published without an explicit representation of the model. Instead, the model is embedded in a software implementation such as MATLAB or Python. The software implementation adds many complexities (e.g., file interfaces and control logic for computing a solution), often in the form of multiple files with minimal documentation. Moreover, models deployed in this way are in a mathematical representation that loses considerable biological information. For example, when a biochemical pathway model is reduced to a set of differential equations the stoichiometric structure of the network is either lost or is difficult to reverse engineer. A further concern is that the mathematical representation greatly complicates the ability to query models to discover similar pathways, kinetics, and other characteristics. These considerations are some of the reasons why genome-scale models are published using SBML [11, 71] so as to preserve as much biological information as possible.

Hence, the primary concern with publishing a model solely as its software implementation is that it is difficult to reuse the model, either in part or in whole. For example, combining a model written in MATLAB with a model written in Python can be a costly exercise. Likewise, converting models from one format to another, for example, MATLAB to SBML, can also be error-prone and costly [20]. Standards such as SBML allow the automated deconstruction [26] and reuse of models [99] through the use of model annotations. Models expressed in SBML are much easier to reuse or extend. When converted to a human-readable language like Antimony [100], reuse becomes even easier.

Other disciplines employ formats such as Modelica [101] or representations such as Simulink [102] to assist in reuse, but such techniques have not been widely used in developing biological models.

If executable code is used to represent models, then the model should ideally be partitioned into reusable program functions with ample documentation to illustrate how to reuse the model and what the various symbols in the mathematical equations represent.

6. Recommended Requirements

The previous discussion provides a wide range of criteria that can be used to satisfy the CURE guidelines, and fulfilling all the requirements would be quite onerous. For most academic studies, it might be sufficient to meet a small number of criteria. For models that might be used in a clinical setting, it would be prudent to satisfy as many criteria as possible. Organizations such as the FDA (Food and Drug Administration) have already begun to issue guidelines for models used in medical devices [103], and there is no reason to doubt that such guidance will eventually be offered for more general use of models in clinical settings.

For academic research, we therefore wish to propose a recommended set of requirements from each aspect of CURE that would be sufficient to significantly impact computational modeling. We provide a checklist in Figure 2, which also highlights the baseline requirements.

Fig. 2:

Fig. 2:

Recommended or baseline requirements for CURE that could be used for research-based models. The baseline requirements are highlighted in shaded pink boxes. A score of one out of ten can be given based on the number of criteria met. For example, the baseline requirement will yield a score of 6/10. Note that if publishing models via source code, it is essential to specify the version number of the software platform as well as version numbers of any dependency packages that were used. The use of containerization platforms such as docker can sometimes help in these situations.

A key requirement is the need to develop standardized approaches to assess and communicate the extent to which a given model, or a modeling study, satisfies the recommended or baseline requirements. An example to consider is the approach taken by the CPMS working group to develop a rubric that considers the extent of outreach to various stakeholders in satisfying various credible practice guidelines [53]. Such evaluations, typically conducted as self-assessments by the respective study authors, can be shared with the community as part of supplementary material in published studies [104, 105]. Automation could greatly facilitate this assessment process, especially if provided in advance and used during model development.

6.1. Baseline Requirements

If the recommended requirements are still too onerous, it is possible to define a baseline requirement. This term refers to the essential or foundational standards that are necessary for basic credibility, though they may not meet the full recommended requirements specified by CURE. The baseline requirement is similar in intent to the recent report [106] from the US National Academies, which emphasizes purpose, verification, validation, uncertainty quantification, and reproducibility, though their statement on reproducibility is vague. We include scope and model limitations into our baseline, which the National Academies documents do not explicitly mention, though it could be considered part of the statement of purpose. The baseline requirements are highlighted in Figure 2, and a more detailed summary is given in Table 2.

Table 2:

Summary of Recommended and Baseline Requirements.

Credibility (Baseline requirement):
1. Clearly define the objectives and scope of the modeling study, including the biological question being addressed and the specific hypotheses to be tested.
2. Use consistent notation and terminology to ensure consistency and clarity in model descriptions. Where possible, follow common notations used in the community.
3. Where possible, verify the model by checking the model with other simulators. Use model-checking tools to identify errors in the model [58].
4. Validate the model against experimental data using accepted statistical procedures, such as cross-validation, to assess model accuracy and predictive power.
5. Where possible, assess how sensitive the model is to uncertainty (UQ) in parameter estimates, model structure, inputs, and assumptions.
6. Clearly document the limitations of the model, including areas where assumptions may not hold or where uncertainties exist, to provide context for interpreting results and guiding future research.
Understandability:
1. Provide a representation of the model that is both machine-readable and human-readable. Ideally, this should be in an open community standard and be an explicit representation of the model that is not intertwined with control logic, file input/output, and other implementation details.
2. Provide comprehensive documentation that explains the model structure, equations, and parameter values, e.g., by following the suggestions by Carey et al. [83]. When using open community standards such as SBML or CellML, submit the models to a recognized model repository. If the model is expressed in a programming language, deposit your model code at repositories such as GitHub, BioModels [24], or ModelDB [31].
3. Where possible, annotate the model to clarify any ambiguous terminology. When using a programming language to express a model, use commenting to annotate the model.
4. Try to document all assumptions made during model development, including simplifications, approximations, and parameter values, to provide transparency and facilitate reproducibility.
5. Try to provide clear graphical illustration of the model. If the model is a biochemical network then use machine-readable formats such as SBGN [107], preferably using a community modeling standard such as SBML Layout [108] and Render [109].
Reproducibility (Baseline requirement):
1. Follow established standards and guidelines for model development, such as the Systems Biology Markup Language (SBML) and Minimum Information Requested in the Annotation of Models (MIRIAM), to enhance interoperability and facilitate model sharing and exchange. If using executable code, make sure best practices for code development are used.
2. Embrace and promote open science practices by openly sharing publicly funded models, data, and code with the scientific community, promoting transparency, reproducibility, and collaboration.
Extensibility and Reuse:
1. Use open modeling standards where possible. If using executable code such as Python, separate the model code from the runtime code. In this form, the model code, in principle, can be reused with minimal effort by other Python users. However, this would require careful commenting on the components of the model. One approach is to provide the model as a software function that can be called by other code. Try to use open-source licensing so that there are no restrictions on the reuse of the research.
2. If a model is represented using a modeling format such as SBML, reuse should be much easier since the model is expressed in biological terms. If the model is annotated then automated systems can be devised to automate the merging and disassembly of models into individual parts of portions for reuse.

7. Conclusion

This paper introduces a set of guidelines for developing robust, credible biological models. These guidelines are meant to complement the FAIR guidelines for data. As with FAIR, we propose a four-word moniker, CURE, that defines four core attributes of good modeling practice. These include credibility, understandability, reproducibility, and extensibility.

The need for CURE models is highlighted by recent interest in Biomedical Digital Twins, a technology that will rely on having robust models of biomedical systems [5, 106, 110], but it is clear that such guidelines would also benefit the wider biological modeling communities.

For credibility, we recommend the use of verification, validation, and uncertainty quantification; for understandability, we discuss the clarity of model descriptions and the importance of annotations; for reproducibility, our focus is standards and open science practices; and for extensibility, we emphasize open standards and the use of modular models. We outline recommended requirements for each guideline and propose a baseline level below the recommended requirements that is largely in alignment with the National Academies report [106].

While all of the CURE principles are important, we wish to highlight credibility, the first principle. Credibility is the degree to which a model can be trusted when applied to a given problem. In a biomedical application where concern is with patients and their well-being, the trustworthiness of a predictive model is paramount. Interestingly, the FDA recently published [103] a guidance document on assessing the credibility of computation models but applied only to medical devices. Many of their recommendations relate to model verification and validation but also contained aspects related to model plausibility, which considers the plausibility of the governing equations, assumptions, and model parameters. The FDA document also emphasizes UQ for estimating uncertainty in the model outputs. These are considered foundational for credible modeling. However, these have not yet infiltrated the biomedical modeling community significantly. The recent National Academies [106] report on Digital Twins emphasizes the same points. The report also stresses the critical need for data and modeling standards to enable interoperability and reuse. As an initial effort to support VVUQ, the BioSimulations resource [68] provides a verification service where a given model can be run against multiple independent simulators to help verify the simulation engines.

Finally, automation is key to making the CURE guidelines workable and practical to reduce the burden on practitioners and accelerate widespread adoption.

Acknowledgments.

This work was supported by NIH Biomedical Imaging and Bioengineering award P41 EB023912 through HMS at the Center for Reproducible Biomedical Modeling (https://reproduciblebiomodels.org/). The content expressed here is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health, or the University of Washington. HMS wishes to thank Eric Johnson Chavarria for suggesting the CURE acronym at the 2023 IMAG meeting in Bethesda, MD. HMS also wishes to thank Hunter Robbins for assistance in collating the author names and addresses.

T.E.G. was supported by a Royal Society University Research Fellowship (URF\R\221008) and the UKRI-BBSRC Engineering Biology Mission Award CYBER (BB/Y007638/1).

S.F. is supported by the Predictive Phenomics Initiative under the Laboratory Directed Research and Development Program at Pacific Northwest National Laboratory, operated by Battelle for the U.S. Department of Energy under Contract No. DE-AC05-76RL01830. J.F. was supported by DARPA through the Automating Scientific Knowledge Extraction and Modeling (ASKEM) program, Agreement No. HR0011262087; NSF awards IIS-2106888 and CMMI-2146306. The views, opinions, and findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense, the U.S. Government, or NSF.

H.M.S. acknowledges research reported in this publication was supported by NIBIB of the National Institutes of Health under award number NIH grant number P41EB023912.

R.L. acknowledges funding from the following awards: NIH 1 R01 HL169974-01, U.S. DoD DARPA HR00112220038, NIH 1 R011AI135128-01, NIH 1 R01 HL169974-01.

R.V. acknowledges funding from the following awards: National Institute on Alcohol Abuse and Alcoholism R01 AA018873, National Heart, Lung, and Blood Institute R01 HL161696. The funding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

NR was funded by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC 2075 - 390740016.

G.D.B acknowledges work was supported by NRNB (U.S. National Institutes of Health, National Center for Research Resources grant number P41 GM103504).

J.H.G acknowledges research reported in this publication was supported by NIBIB of the National Institutes of Health under award number NIH grant number P41EB023912.

L.M.L acknowledges work was supported by NIH grant R24 GM137787 from the National Institute of General Medical Sciences.

J.L.S acknowledges funding from the following award: DST/NRF SARCHI-82813.

D.v.N. acknowledges funding from the following award: DST/NRF SRUG2204173612.

P.M. acknowledges work was supported by NIH grant R24 GM137787 from the National Institute of General Medical Sciences.

F.F. acknowledges work was supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (CC2242), the UK Medical Research Council (CC2242), and the Wellcome Trust (CC2242).

J.R.F acknowledges support from NIH grants P41GM10371 and R01GM115805.

T.J.S. acknowledges funding from NSF grant 2000281.

A.D. acknowledges support by the German Center for Infection Research (DZIF), grant № 8020708703.

J.M.R. acknowledges funding from the following award: NRF grant number SRUG2204295377.

Footnotes

Ethics declarations

Competing interests

The authors declare no competing interests.

1

Another organization devoted to the promotion of curation practices of research compendia also uses the acronym CURE:“Curating for Reproducibility”. However, there is little overlap between the two usages.

References

  • [1].Wilkinson M.D., Dumontier M., Aalbersberg I.J., Appleton G., Axton M., Baak A., Blomberg N., Boiten J.-W., Silva Santos L.B., Bourne P.E., et al. : The FAIR guiding principles for scientific data management and stewardship. Scientific Data 3(1), 1–9 (2016) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Ravi N., Chaturvedi P., Huerta E., Liu Z., Chard R., Scourtas A., Schmidt K., Chard K., Blaiszik B., Foster I.: FAIR principles for AI models with a practical application for accelerated high energy diffraction microscopy. Scientific Data 9(1), 657 (2022) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Raissi M., Perdikaris P., Karniadakis G.E.: Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics 378, 686–707 (2019) [Google Scholar]
  • [4].Juarez M.G., Botti V.J., Giret A.S.: Digital twins: Review and challenges. Journal of Computing and Information Science in Engineering 21(3), 030802 (2021) [Google Scholar]
  • [5].Laubenbacher R., Adler F., An G., Castiglione F., Eubank S., Fonseca L.L., Glazier J., Helikar T., Jett-Tilton M., Kirschner D., et al. : Toward mechanistic medical digital twins: some use cases in immunology. Frontiers in Digital Health 6, 1349595 (2024) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Michael C.T., Almohri S.A., Linderman J.J., Kirschner D.E.: A framework for multi-scale intervention modeling: virtual cohorts, virtual clinical trials, and model-to-model comparisons. Front. Syst. Biol. 3 (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Dill K.A., MacCallum J.L.: The protein-folding problem, 50 years on. Science 338(6110), 1042–1046 (2012) [DOI] [PubMed] [Google Scholar]
  • [8].McCammon J.A., Gelin B.R., Karplus M.: Dynamics of folded proteins. Nature 267(5612), 585–590 (1977) [DOI] [PubMed] [Google Scholar]
  • [9].Niarakis A., Laubenbacher R., An G., Ilan Y., Fisher J., Flobak Å., Reiche K., Rodríguez Martínez M., Geris L., Ladeira L., Veschini L., Blinov M.L., Messina F., Fonseca L.L., Ferreira S., Montagud A., Noël V., Marku M., Tsirvouli E., Torres M.M., Harris L.A., Sego T.J., Cockrell C., Shick A.E., Balci H., Salazar A., Rian K., Hemedan A.A., Esteban-Medina M., Staumont B., Hernandez-Vargas E., Martis B S., Madrid-Valiente A., Karampelesis P., Sordo Vieira L., Harlapur P., Kulesza A., Nikaein N., Garira W., Malik Sheriff R.S., Thakar J., Tran V.D.T., Carbonell-Caballero J., Safaei S., Valencia A., Zinovyev A., Glazier J.A.: Immune digital twins for complex human pathologies: applications, limitations, and challenges. npj Systems Biology and Applications 10(1), 141 (2024) 10.1038/s41540-024-00450-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Laubenbacher R., Niarakis A., Helikar T., An G., Shapiro B., Malik-Sheriff R.S., Sego T.J., Knapp A., Macklin P., Glazier J.A.: Building digital twins of the human immune system: toward a roadmap. npj Digital Medicine 5(1), 64 (2022) 10.1038/s41746-022-00610-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Hucka M., Finney A., Sauro H.M., Bolouri H., Doyle J.C., Kitano H., Arkin A.P., Bornstein B.J., Bray D., Cornish-Bowden A., et al. : The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19(4), 524–531 (2003) [DOI] [PubMed] [Google Scholar]
  • [12].Clerx M., Cooling M.T., Cooper J., Garny A., Moyle K., Nickerson D.P., Nielsen P.M., Sorby H.: CellML 2.0. Journal of Integrative Bioinformatics 17(2–3), 20200021 (2020) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Gleeson P., Crook S., Cannon R.C., Hines M.L., Billings G.O., Farinella M., Morse T.M., Davison A.P., Ray S., Bhalla U.S., et al. : NeuroML: a language for describing data driven models of neurons and networks with a high degree of biological detail. PLoS Computational Biology 6(6), 1000815 (2010) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Gu C., Kim G.B., Kim W.J., Kim H.U., Lee S.Y.: Current status and applications of genome-scale metabolic models. Genome biology 20, 1–18 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].King Z.A., Lu J., Dräger A., Miller P., Federowicz S., Lerman J.A., Ebrahim A., Palsson B.O., Lewis N.E.: BiGG models: A platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Research 44(D1), 515–522 (2016) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Arkin A.P., Cottingham R.W., Henry C.S., Harris N.L., Stevens R.L., Maslov S., Dehal P., Ware D., Perez F., Canon S., et al. : KBase: the united states department of energy systems biology knowledgebase. Nature Biotechnology 36(7), 566–569 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Blinov M.L., Gennari J.H., Karr J.R., Moraru I.I., Nickerson D.P., Sauro H.M.: Practical resources for enhancing the reproducibility of mechanistic modeling in systems biology. Current Opinion in Systems Biology 27, 100350 (2021) [Google Scholar]
  • [18].Porubsky V., Smith L., Sauro H.M.: Publishing reproducible dynamic kinetic models. Briefings in Bioinformatics 22(3), 152 (2021) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Tiwari K., Kananathan S., Roberts M.G., Meyer J.P., Sharif Shohan M.U., Xavier A., Maire M., Zyoud A., Men J., Ng S., et al. : Reproducibility in systems biology modelling. Molecular Systems Biology 17(2), 9982 (2021) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Erdem C., Mutsuddy A., Bensman E.M., Dodd W.B., Saint-Antoine M.M., Bouhaddou M., Blake R.C., Gross S.M., Heiser L.M., Feltus F.A., et al. : A scalable, open-source implementation of a large-scale mechanistic model for single cell proliferation and death signaling. Nature Communications 13(1), 3555 (2022) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Niarakis A., Kuiper M., Ostaszewski M., Malik Sheriff R.S., Casals-Casas C., Thieffry D., Freeman T.C., Thomas P., Touré V., Noël V., Stoll G., Saez-Rodriguez J., Naldi A., Oshurko E., Xenarios I., Soliman S., Chaouiya C., Helikar T., Calzone L.: Setting the basis of best practices and standards for curation and annotation of logical models in biology—highlights of the [bc]2 2019 colomoto/sysmod workshop. Briefings in Bioinformatics 22(2), 1848–1859 (2020) 10.1093/bib/bbaa046 https://academic.oup.com/bib/article-pdf/22/2/1848/36654102/bbaa046.pdf [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Blais E.M., Chavali A.K., Papin J.A.: Linking genome-scale metabolic modeling and genome annotation. Systems Metabolic Engineering: Methods and Protocols, 61–83 (2013) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Passi A., Tibocha-Bonilla J.D., Kumar M., Tec-Campos D., Zengler K., Zuniga C.: Genome-scale metabolic modeling enables in-depth understanding of big data. Metabolites 12(1), 14 (2021) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Malik-Sheriff R.S., Glont M., Nguyen T.V., Tiwari K., Roberts M.G., Xavier A., Vu M.T., Men J., Maire M., Kananathan S., et al. : BioModels—15 years of sharing computational models in life science. Nucleic Acids Research 48(D1), 407–415 (2020) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Novère N.L., Finney A., Hucka M., Bhalla U.S., Campagne F., Collado-Vides J., Crampin E.J., Halstead M., Klipp E., Mendes P., et al. : Minimum information requested in the annotation of biochemical models (miriam). Nature Biotechnology 23(12), 1509–1515 (2005) [DOI] [PubMed] [Google Scholar]
  • [26].Neal M.L., Galdzicki M., Gallimore J., Sauro H.M.: A c library for retrieving specific reactions from the biomodels database. Bioinformatics 30(1), 129–130 (2014) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Ingalls B.P.: Mathematical Modeling in Systems Biology: an Introduction. MIT press, ??? (2013) [Google Scholar]
  • [28].Sauro H.M.: Systems Biology: Introduction to Pathway Modeling. Ambrosius Publishing, ??? (2020) [Google Scholar]
  • [29].Kim O.D., Rocha M., Maia P.: A review of dynamic modeling approaches and their application in computational strain optimization for metabolic engineering. Frontiers in Microbiology 9, 378885 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Peters M., Eicher J.J., Van Niekerk D.D., Waltemath D., Snoep J.L.: The jws online simulation database. Bioinformatics 33(10), 1589–1590 (2017) [DOI] [PubMed] [Google Scholar]
  • [31].McDougal R.A., Morse T.M., Carnevale T., Marenco L., Wang R., Migliore M., Miller P.L., Shepherd G.M., Hines M.L.: Twenty years of ModelDB and beyond: building essential modeling tools for the future of neuroscience. Journal of computational neuroscience 42, 1–10 (2017) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Choi J.-S., Waxman S.G.: Physiological interactions between nav1. 7 and nav1. 8 sodium channels: a computer simulation study. Journal of Neurophysiology 106(6), 3173–3184 (2011) [DOI] [PubMed] [Google Scholar]
  • [33].Lopez C.F., Muhlich J.L., Bachman J.A., Sorger P.K.: Programming biological models in python using pysb. Molecular systems biology 9(1), 646 (2013) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Brandl G.: Sphinx documentation. URL http://sphinx-doc.org/sphinx.pdf (2021) [Google Scholar]
  • [35].Plank G., Loewe A., Neic A., Augustin C., Huang Y.-L., Gsell M.A., Karabelas E., Nothstein M., Prassl A.J., Sánchez, J., et al. : The opencarp simulation environment for cardiac electrophysiology. Computer methods and Programs in Biomedicine 208, 106223 (2021) [DOI] [PubMed] [Google Scholar]
  • [36].Britten R.D., Christie G.R., Little C., Miller A.K., Bradley C., Wu A., Yu T., Hunter P., Nielsen P.: Fieldml, a proposed open standard for the physiome project for mathematical model representation. Medical & biological engineering & computing 51, 1191–1207 (2013) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Schaff J.C., Lakshminarayana A., Murphy R.F., Bergmann F.T., Funahashi A., Sullivan D.P., Smith L.P.: Sbml level 3 package: spatial processes, version 1, release 1. Journal of Integrative Bioinformatics 20(1), 20220054 (2023) [Google Scholar]
  • [38].Mendes P., Kell D.: Non-linear optimization of biochemical pathways: applications to metabolic engineering and parameter estimation. Bioinformatics 14(10), 869–883 (1998) [DOI] [PubMed] [Google Scholar]
  • [39].Gábor A., Banga J.R.: Robust and efficient parameter estimation in dynamic models of biological systems. BMC Systems Biology 9, 1–25 (2015) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [40].Ashyraliyev M., Fomekong-Nanfack Y., Kaandorp J.A., Blom J.G.: Systems biology: parameter estimation for biochemical models. The FEBS Journal 276(4), 886–902 (2009) [DOI] [PubMed] [Google Scholar]
  • [41].Wieland F.-G., Hauber A.L., Rosenblatt M., Tönsing, C., Timmer, J.: On structural and practical identifiability. Current Opinion in Systems Biology 25, 60–69 (2021) [Google Scholar]
  • [42].Shockley E.M., Vrugt J.A., Lopez C.F.: PyDREAM: high-dimensional parameter inference for biological models in python. Bioinformatics 34(4), 695–697 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [43].Irvin M.W., Ramanathan A., Lopez C.F.: Model certainty in cellular network-driven processes with missing data. PLoS Computational Biology 19(4), 1011004 (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [44].Nikolados E., Wongprommoon A., Aodha, al., O.M.: Accuracy and data efficiency in deep learning models of protein expression. Nature Communications 13(7755) (2022) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [45].Zittrain J.: The hidden costs of automated thinking. The New Yorker; (2019) [Google Scholar]
  • [46].Islam N., Ebrahimzadeh S., Salameh J., Kazi S., Fabiano N., Treanor L., Absi M., Hallgrimson Z., Leeflang M., Hooft L., et al. : How accurate is chest imaging for diagnosing COVID-19. Cochrane (2021) [Google Scholar]
  • [47].Niarakis A., Waltemath D., Glazier J., Schreiber F., Keating S.M., Nickerson D., Chaouiya C., Siegel A., Noël V., Hermjakob H., Helikar T., Soliman S., Calzone L.: Addressing barriers in comprehensiveness, accessibility, reusability, interoperability and reproducibility of computational models in systems biology. Briefings in Bioinformatics 23(4), 212 (2022) 10.1093/bib/bbac212 https://academic.oup.com/bib/article-pdf/23/4/bbac212/57554122/bbac212.pdf [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [48].Yilmaz L., Liu B.: Model credibility revisited: Concepts and considerations for appropriate trust. Journal of Simulation 16(3), 312–325 (2022) [Google Scholar]
  • [49].Tatka L.T., Smith L.P., Hellerstein J.L., Sauro H.M.: Adapting modeling and simulation credibility standards to computational systems biology. Journal of Translational Medicine 21(1), 501 (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [50].Schlesinger S.: Terminology for model credibility. Simulation 32(3), 103–104 (1979) [Google Scholar]
  • [51].Erdemir A., Mulugeta L., Ku J.P., Drach A., Horner M., Morrison T.M., Peng G.C., Vadigepalli R., Lytton W.W., Myers J.G. Jr: Credible practice of modeling and simulation in healthcare: ten rules from a multidisciplinary perspective. Journal of Translational Medicine 18(1), 369 (2020) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [52].Musuamba F., Skottheim Rusten I., Lesage R., et al. : Scientific and regulatory evaluation of mechanistic in silico drug and disease models in drug development: building model credibility. CPT Pharmacometrics and Systems Pharmacology 10, 804–825 (2021) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [53].Vadigepalli R., Manchel A., Erdemir A., Mulugeta L., Ku J.P., Rego B.V., Horner M., Lytton W.W., Myers J.G.: A rubric for assessing conformance to the ten rules for credible practice of modeling and simulation in healthcare. medRxiv, 2024–10 (2024) [DOI] [PubMed] [Google Scholar]
  • [54].Thacker B., Doebling S., Hemez F., Anderson M., Pepin J., Rodriguez E.: Concepts of model verification and validation (No. LA-14167). Los Alamos National Lab., Los Alamos, NM (US) (2004) [Google Scholar]
  • [55].Council N.R., Engineering D., Sciences P., Mathematical Sciences B., Applications T., Verification C., Quantification U.: Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification. National Academies Press, ??? (2012) [Google Scholar]
  • [56].Bergmann F.T., Sauro H.M.: Comparing simulation results of SBML capable simulators. Bioinformatics 24(17), 1963–1965 (2008) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [57].Hucka M., Independent A.F., He E., Keating S.: SBML Test Suite (2022). 10.5281/zenodo.1112521 [DOI] [Google Scholar]
  • [58].Hellerstein J.L., Gu S., Choi K., Sauro H.M.: Recent advances in biomedical simulations: a manifesto for model engineering. F1000Research 8 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [59].Corrado C., Williams S., Karim R., Plank G., O’Neill M., Niederer S.: A work flow to build and validate patient specific left atrium electrophysiology models from catheter measurements. Medical image analysis 47, 153–163 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [60].Touré V., Flobak, Å., Niarakis, A., Vercruysse, S., Kuiper, M.: The status of causality in biological databases: data resources and data retrieval possibilities to support logical modeling. Briefings in Bioinformatics 22(4), 390 (2020) 10.1093/bib/bbaa390 https://academic.oup.com/bib/article-pdf/22/4/bbaa390/39165031/bbaa390.pdf [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [61].Hall B.A., Niarakis A.: Data integration in logic-based models of biological mechanisms. Current Opinion in Systems Biology 28, 100386 (2021) 10.1016/j.coisb.2021.100386 [DOI] [Google Scholar]
  • [62].Kirk P., Thorne T., Stumpf M.P.: Model selection in systems and synthetic biology. Current opinion in biotechnology 24(4), 767–774 (2013) [DOI] [PubMed] [Google Scholar]
  • [63].Akaike H.: A new look at the statistical model identification. IEEE Transactions on Automatic Control 19(6), 716–723 (1974) [Google Scholar]
  • [64].Beik S.P., Harris L.A., Kochen M.A., Sage J., Quaranta V., Lopez C.F.: Unified tumor growth mechanisms from multimodel inference and dataset integration. PLOS Computational Biology 19(7), 1011215 (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [65].Kullback S., Leibler R.A.: On information and sufficiency. The Annals of Mathematical Statistics 22(1), 79–86 (1951) [Google Scholar]
  • [66].Viceconti M., Pappalardo F., Rodriguez B., Horner M., Bischoff J., Tshinanu F.M.: In silico trials: Verification, validation and uncertainty quantification of predictive models used in the regulatory evaluation of biomedical products. Methods 185, 120–127 (2021) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [67].Booker J.M., Ross T.J.: An evolution of uncertainty assessment and quantification. Scientia Iranica 18(3), 669–676 (2011) [Google Scholar]
  • [68].Shaikh B., Smith L.P., Vasilescu D., Marupilla G., Wilson M., Agmon E., Agnew H., Andrews S.S., Anwar A., Beber M.E., et al. : BioSimulators: a central registry of simulation engines and services for recommending specific tools. Nucleic Acids Research 50(W1), 108–114 (2022) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [69].unittest — Unit testing framework. https://docs.python.org/3/library/unittest.html [Google Scholar]
  • [70].Hoops S., Sahle S., Gauges R., Lee C., Pahle J., Simus N., Singhal M., Xu L., Mendes P., Kummer U.: COPASI—a complex pathway simulator. Bioinformatics 22(24), 3067–3074 (2006) [DOI] [PubMed] [Google Scholar]
  • [71].Lieven C., Beber M.E., Olivier B.G., Bergmann F.T., Ataman M., Babaei P., Bartell J.A., Blank L.M., Chauhan S., Correia K., et al. : MEMOTE for standardized genome-scale metabolic model testing. Nature Biotechnology 38(3), 272–276 (2020) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [72].Fried E.I.: Theories and models: What they are, what they are for, and what they are about. Psychological Inquiry 31(4), 336–344 (2020) [Google Scholar]
  • [73].Regt D., W H.: Understanding Scientific Understanding. Oxford University Press, Oxford: (2017) [Google Scholar]
  • [74].Chang Y., Wang X., Wang J., Wu Y., Yang L., Zhu K., Chen H., Yi X., Wang C., Wang Y., et al. : A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology 15(3), 1–45 (2024) [Google Scholar]
  • [75].Thirunavukarasu A.J., Ting D.S.J., Elangovan K., Gutierrez L., Tan T.F., Ting D.S.W.: Large language models in medicine. Nature medicine 29(8), 1930–1940 (2023) [DOI] [PubMed] [Google Scholar]
  • [76].Price I., Sanchez-Gonzalez A., Alet F. e.a.: Probabilistic weather fore-casting with machine learning. Nature (2024) 10.1038/s41586-024-08252-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [77].Arvanitou E.-M., Ampatzoglou A., Chatzigeorgiou A., Carver J.C.: Software engineering practices for scientific software development: A systematic mapping study. Journal of Systems and Software 172, 110848 (2021) [Google Scholar]
  • [78].Cain J.Y., Yu J.S., Bagheri N.: The in silico lab: Improving academic code using lessons from biology. Cell Systems 14(1), 1–6 (2023) 10.1016/j.cels.2022.11.006 [DOI] [PubMed] [Google Scholar]
  • [79].Lteif G.: Writing Clean Code — How It Impacts the Future of Your Product. https://softwaredominos.com/home/software-design-development-articles/writing-quality-clean-code/ (2024) [Google Scholar]
  • [80].Nicolis G., Prigogine I.: Exploring complexity: An introduction (1989) [Google Scholar]
  • [81].Le Novère N.: Model storage, exchange and integration. BMC Neuroscience 7, 1–9 (2006) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [82].Courtot M., Juty N., Knüpfer C., Waltemath D., Zhukova A., Dräger A., Dumontier M., Finney A., Golebiewski M., Hastings J., et al. : Controlled vocabularies and semantics in systems biology. Molecular Systems Biology 7(1), 543 (2011) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [83].Carey M.A., Dräger A., Beber M.E., Papin J.A., Yurkovich J.T.: Community standards to facilitate development and address challenges in metabolic modeling. Molecular Systems Biology 16(8), 9235 (2020) 10.15252/msb.20199235 https://www.embopress.org/doi/pdf/10.15252/msb.20199235 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [84].Zhukova A., Waltemath D., Juty N., Laibe C., Le Novère N.: Kinetic simulation algorithm ontology. Nature Precedings, 1–1 (2011) [Google Scholar]
  • [85].Kholodenko B.N.: Cell-signalling dynamics in time and space. Nature reviews Molecular cell biology 7(3), 165–176 (2006) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [86].Reijenga K.A., Megen Y.M., Kooi B.W., Bakker B.M., Snoep J.L., Verseveld H.W., Westerhoff H.V.: Yeast glycolytic oscillations that are not controlled by a single oscillophore: a new definition of oscillophore strength. Journal of Theoretical Biology 232(3), 385–398 (2005) [DOI] [PubMed] [Google Scholar]
  • [87].Csete M.E., Doyle J.C.: Reverse engineering of biological complexity. Science 295(5560), 1664–1669 (2002) [DOI] [PubMed] [Google Scholar]
  • [88].Andrews S.S., Wiley H.S., Sauro H.M.: Design patterns of biological cells. BioEssays, 2300188 (2024) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [89].Baker M.: 1,500 scientists lift the lid on reproducibility. Nature 533(7604) (2016) [DOI] [PubMed] [Google Scholar]
  • [90].Plesser H.E.: Reproducibility vs. replicability: a brief history of a confused terminology. Frontiers in neuroinformatics 11, 76 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [91].Porubsky V.L., Goldberg A.P., Rampadarath A.K., Nickerson D.P., Karr J.R., Sauro H.M.: Best practices for making reproducible biochemical models. Cell Systems 11(2), 109–120 (2020) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [92].Shin J., Porubsky V., Carothers J., Sauro H.M.: Standards, dissemination, and best practices in systems biology. Current Opinion in Biotechnology 81, 102922 (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [93].Mendes P.: Reproducible research using biomodels. Bulletin of Mathematical Biology 80(12), 3081–3087 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [94].Höpfl S., Pleiss J., Radde N.E.: Bayesian estimation reveals that reproducible models in systems biology get more citations. Scientific Reports 13(1), 2695 (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [95].Coveney P.V., Groen D., Hoekstra A.G.: Reliability and reproducibility in computational science: Implementing validation, verification and uncertainty quantification in silico. The Royal Society Publishing; (2021) [DOI] [PubMed] [Google Scholar]
  • [96].Moraru I.I., Schaff J.C., Slepchenko B.M., Blinov M., Morgan F., Lakshminarayana A., Gao F., Li Y., Loew L.M.: Virtual Cell modelling and simulation software environment. IET systems biology 2(5), 352–362 (2008) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [97].Choi K., Medley J.K., König M., Stocking K., Smith L., Gu S., Sauro H.M.: Tellurium: an extensible python-based modeling environment for systems and synthetic biology. Biosystems 171, 74–79 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [98].Medley J.K., Choi K., König M., Smith L., Gu S., Hellerstein J., Sealfon S.C., Sauro H.M.: Tellurium notebooks—an environment for reproducible dynamical modeling in systems biology. PLoS Computational Biology 14(6), 1006220 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [99].Neal M.L., Cooling M.T., Smith L.P., Thompson C.T., Sauro H.M., Carlson B.E., Cook D.L., Gennari J.H.: A reappraisal of how to build modular, reusable models of biological systems. PLoS Computational Biology 10(10), 1003849 (2014) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [100].Smith L.P., Bergmann F.T., Chandran D., Sauro H.M.: Antimony: a modular model definition language. Bioinformatics 25(18), 2452–2454 (2009) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [101].Mattsson S.E., Elmqvist H., Otter M.: Physical system modeling with modelica. Control Engineering Practice 6(4), 501–510 (1998) [Google Scholar]
  • [102].Dabney J.B., Harman T.L.: Mastering Simulink vol. 230. Pearson/Prentice Hall Upper Saddle River, New Jersey; (2004) [Google Scholar]
  • [103].Food and Drug Administration, et al. : Assessing the credibility of computational modeling and simulation in medical device submissions. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/assessing-credibility-computational-modeling-and-simulation-medical-device-submissions (2021)
  • [104].Verma A., Antony A.N., Ogunnaike B.A., Hoek J.B., Vadigepalli R.: Causality analysis and cell network modeling of spatial calcium signaling patterns in liver lobules. Frontiers in physiology 9, 1377 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [105].Gee M.M., Lenhoff A.M., Schwaber J.S., Ogunnaike B.A., Vadigepalli R.: Closed-loop modeling of central and intrinsic cardiac nervous system circuits underlying cardiovascular control. AIChE Journal 69(4), 18033 (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [106].National Academies of Sciences, Engineering and Medicine, et al. : Foundational research gaps and future directions for digital twins. https://nap.nationalacademies.org/catalog/26894/foundational-research-gaps-and-future-directions-for-digital-twins (2023) [PubMed] [Google Scholar]
  • [107].Novère N.L., Hucka M., Mi H., Moodie S., Schreiber F., Sorokin A., Demir E., Wegner K., Aladjem M.I., Wimalaratne S.M., et al. : The systems biology graphical notation. Nature biotechnology 27(8), 735–741 (2009) [DOI] [PubMed] [Google Scholar]
  • [108].Gauges R., Rost U., Sahle S., Wegner K.: A model diagram layout extension for sbml. Bioinformatics 22(15), 1879–1885 (2006) [DOI] [PubMed] [Google Scholar]
  • [109].Bergmann F.T., Keating S.M., Gauges R., Sahle S., Wengler K.: Sbml level 3 package: render, version 1, release 1. Journal of Integrative Bioinformatics 15(1), 20170078 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [110].Bhagirath P., Strocchi M., Bishop M.J., Boyle P.M., Plank G.: From bits to bedside: Entering the age of digital twins in cardiac electrophysiology. Europace in press; (2024) [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from ArXiv are provided here courtesy of arXiv

RESOURCES