Summary
In systems biology modeling, important steps include model parameterization, uncertainty quantification, and evaluation of agreement with experimental observations. To help modelers perform these steps, we developed the software PyBioNetFit, which in addition supports checking models against known system properties and solving design problems. PyBioNetFit introduces Biological Property Specification Language (BPSL) for the formal declaration of system properties. BPSL allows qualitative data to be used alone or in combination with quantitative data. PyBioNetFit performs parameterization with parallelized metaheuristic optimization algorithms that work directly with existing model definition standards: BioNetGen Language (BNGL) and Systems Biology Markup Language (SBML). We demonstrate PyBioNetFit's capabilities by solving various example problems, including the challenging problem of parameterizing a 153-parameter model of cell cycle control in yeast based on both quantitative and qualitative data. We demonstrate the model checking and design applications of PyBioNetFit and BPSL by analyzing a model of targeted drug interventions in autophagy signaling.
Subject Areas: Biological Sciences, Bioinformatics, Systems Biology, Complex Systems, Computer Science, Parallel System
Graphical Abstract

Highlights
-
•
PyBioNetFit is a software tool for parameterizing systems biology models
-
•
PyBioNetFit has support for uncertainty quantification, model checking, and design
-
•
BPSL enables formulation of qualitative system properties to use in fitting
-
•
Example problems are demonstrated on single workstations and on computer clusters
Biological Sciences; Bioinformatics; Systems Biology; Complex Systems; Computer Science; Parallel System
Introduction
An important step in the development of a mathematical model for a biological system is using experimental data to identify model parameters. In a conventional approach, the experimental data of most utility are quantitative time courses and/or dose-response curves. Parameters are adjusted to minimize the difference between the model outputs and the experimental data (as measured, for example, by a residual sum-of-squares function).
In some cases, there are straightforward solutions for parameter identification. For example, software tools such as Data2Dynamics (Raue et al., 2015) and COPASI (Hoops et al., 2006) implement practical parameterization methods for biological applications. These programs can, for example, use gradient-based optimization to solve the benchmark problems of Raue et al. (2013) and Hass et al. (2019). These problems feature ODE models, which typically consist of tens of equations. One contains 500 equations. As powerful and practical as Data2Dynamics and COPASI are, not all biological models fall into a category that can be solved with these tools. When current software tools are inadequate, modelers must resort to either problem-specific code or manual adjustment of parameters. Both these approaches are tedious from the perspective of the modeler and also present challenges for reproducibility of the modeling work (Medley et al., 2016, Waltemath and Wolkenhauer, 2016). Therefore, there is strong motivation to expand the scope of problems that can be solved using general-purpose software compatible with standard model definition formats.
We developed the software PyBioNetFit to solve three major classes of parameterization problems for which current software solutions are limited. (1) Problems with larger than usual numbers of ODEs. The size of an ODE-fitting problem depends primarily on two considerations: the number of differential equations and the number of free parameters. Parameterization cost typically has a dependence on both these quantities, but the relative importance depends on the method used for parameterization. Large problem size in terms of equation count often arises when using rule-based modeling. Rule-based modeling is the preferred approach for processes in which a combinatorial explosion in the number of possible chemical species makes it challenging to enumerate every possible chemical reaction (Chylek et al., 2013, Faeder et al., 2005). In a rule-based model, a concise set of rules can be expanded to generate a much larger system of ODEs (hundreds to thousands of equations from a model with tens of rules). Although the number of equations grows large, the number of parameters remains proportional to the number of rules, which is typically much smaller than the number of rule-implied reactions. In this way, rule-derived ODE models differ from manually formulated ODE models, which typically have a parameter count proportional to the number of reactions. For ODE systems at the scale typically found in rule-based modeling, gradient-based methods using finite differences or forward sensitivity analysis are computationally expensive. Adjoint sensitivity analysis enables more scalable gradient computation (Cao et al., 2002), but current software is limited in the supported workflows. (2) Problems featuring models that are simulated stochastically. This class of problems includes rule-based models in which the implied ODE system is so large that it cannot be derived from rules or numerically integrated efficiently (Sneddon et al., 2011, Suderman et al., 2019). In such cases, the objective function is not differentiable, so standard gradient-based methods cannot be used. (3) Problems including unconventional experimental data, in particular non-numerical qualitative data. Such datasets are often collected by experimentalists and have the potential to inform model parameterization (Mitra et al., 2018), but currently are rarely used in practice. Notable exceptions are works of Tyson and coworkers (Chen et al., 2000, Chen et al., 2004, Csikász-Nagy et al., 2006, Kraikivski et al., 2015, Oguz et al., 2013) and Pargett and coworkers (Pargett and Umulis, 2013, Pargett et al., 2014).
We address problems (1) and (2) by using parallelized metaheuristic optimization algorithms in place of gradient-based algorithms. Metaheuristics are a well-established class of optimization algorithms that do not rely on gradient information. If gradient information is available, metaheuristics can benefit by working in combination with gradient-based methods (Villaverde et al., 2019), as in memetic algorithms (Neri et al., 2012). Metaheuristics carry no guarantee for convergence to a global optimum but are found to be effective in many use cases (Gandomi et al., 2013). Examples of metaheuristics include differential evolution (Storn and Price, 1997), particle swarm optimization (Eberhart and Kennedy, 1995), and scatter search (Glover et al., 2000). Such algorithms often include some type of iterative randomized selection of candidate parameter sets, followed by evaluation of the selected parameter sets, which is used to direct the selection of parameters in future iterations to more favorable regions of parameter space. Many modern descriptions of metaheuristics allow for parallelized evaluation of parameter sets (Moraes et al., 2015, Penas et al., 2015, Penas et al., 2017), which is valuable when each model simulation is computationally expensive. Although these algorithms are well-established, software designed for biological applications is limited. COPASI (Hoops et al., 2006) and Data2Dynamics (Raue et al., 2015) both include metaheuristic algorithms, but these algorithms are not parallelized, which limits their performance with computationally intensive models. The software BioNetFit (Thomas et al., 2016) (called BioNetFit 1 in this report to distinguish it from the newly developed software) was an early effort to use a parallelized evolutionary algorithm to parameterize rule-based biological models. However, the BioNetFit 1 algorithm is inefficient in many cases, and in general, optimization algorithm performance is problem dependent, and so a toolbox of methods is needed to enable a wide range of problems to be solved efficiently. PyBioNetFit was inspired by BioNetFit 1 but is an entirely new code base that includes multiple, robust metaheuristic algorithms.
We address Problem 3 by following the approach of Mitra et al. (2018) for parameterizing models using both qualitative and quantitative data. In this approach, properties of interest are represented as one or more inequality constraints on the outputs of a model, enforced during some portion of a simulation. In some cases, a single qualitative observation, such as the viability of a particular mutant, implies several system properties (inequalities). For example, in a model of the yeast cell cycle (Laomettachit et al., 2016), if a yeast strain is viable, three variables representing bud formation, origin activation, and spindle assembly must each exceed a specified threshold. After defining inequalities, we cast each inequality as a static penalty function (Smith and Coit, 1997), added to the objective function to be minimized. The result is a scalar-valued objective function with contributions from both qualitative and quantitative data; this function is minimized during fitting. This approach is fairly straightforward, and it has been demonstrated to be effective for parameterization of biological models using qualitative data (Mitra et al., 2018). An important feature (in contrast to other constrained optimization methods) is allowance for the possibility that some of the inequality constraints may not be satisfied (because they arise from uncertain experimental data).
To extend this approach for use in general-purpose software, we require a language to express arbitrary system properties of interest. In systems biology, there is no established means for formalizing system properties, although attempts have been made to do so with temporal logic (Clarke et al., 2008, David et al., 2012, Heath et al., 2008, Kwiatkowska et al., 2008), sometimes as part of model parameterization (Hussain et al., 2015, Khalid and Jha, 2018, Liu and Faeder, 2016). There is a lack of software tools tailored for biological modeling that support property specification languages—most studies that incorporate temporal logic do so with problem-specific code. In addition, there are few demonstrations of how the formalism of temporal logic, originally developed for computer science applications (Clarke et al., 1986), can be applied to describe biologically interesting properties such as case-control comparisons. To address these deficiencies, we developed the Biological Property Specification Language (BPSL) as part of PyBioNetFit. BPSL is a domain-specific language for declaration of biological system properties and allows such properties to be used as part of parameterization.
To complement its parameterization features, PyBioNetFit includes methods for uncertainty quantification of parameter estimates. Bayesian uncertainty quantification can be performed using Markov chain Monte Carlo (MCMC) with the Metropolis-Hastings (MH) algorithm (reviewed by Chib and Greenberg, 1995) or parallel tempering (reviewed by Earl and Deem, 2005). These methods start with an assumed prior probability distribution for each parameter, and a likelihood function, and aim to sample the multidimensional posterior probability distribution of the parameters given the data. Simulations can be performed using sampled parameter sets to quantify the uncertainty of model predictions. PyBioNetFit also supports bootstrapping, which performs uncertainty quantification by resampling data (Efron and Tibshirani, 1993, Press et al., 2007).
Although PyBioNetFit and BPSL were designed primarily for model parameterization, BPSL also enables formalized approaches to model checking, somewhat as in computer science (Clarke et al., 1999), and design, somewhat as in optimal control. For our application, we define model checking as performing verification of whether a model reproduces a set of specified properties. Applications of formal model checking to biological processes have been considered in earlier work, including for stochastic models (Clarke et al., 2008, Heath et al., 2008, Kwiatkowska et al., 2008). Much more often, model checking in biology is done informally as part of building a model. However, as models become more detailed, with an increasing number of known properties, a more formal and systematic system of model checking is useful: it can help in communicating what knowledge went into building the model and for comparing the predictions of different models. Design represents a related application, analogous to the classical use of constrained optimization techniques. In a design problem in PyBioNetFit, we seek an intervention (a perturbation of a parameterized model) that brings about a desired set of BPSL-defined system behaviors, for example, choosing drug doses to up- or down-regulate the activity of a target pathway.
All the above-mentioned features of PyBioNetFit are designed to be used in conjunction with existing model definition standards, avoiding the need for problem-specific code. PyBioNetFit natively supports models defined in BioNetGen Language (BNGL) (Faeder et al., 2009), a language for rule-based models, and core SBML (Hucka et al., 2003), a language for more conventional models. For BNGL models, PyBioNetFit supports the simulators available in BioNetGen (Harris et al., 2016, Faeder et al., 2009, Blinov et al., 2004, Sneddon et al., 2011). For SBML models, PyBioNetFit uses the simulator libRoadRunner (Somogyi et al., 2015). PyBioNetFit has a modular design that makes it possible to add support for additional model standards and simulators in the future. Currently, other model standards are indirectly supported by converting to BNGL or SBML. For example, rule-based models defined in the Kappa language (Danos and Laneve, 2004, Sorokina et al., 2013) can be converted to BNGL using the software tool TRuML (Suderman and Hlavacek, 2017).
To demonstrate the capabilities of PyBioNetFit, we solved a series of example optimization problems. We solved a total of 31 problems, 25 of which featured published, biologically relevant models (Blinov et al., 2006, Boehm et al., 2014, Brännmark et al., 2010, Chylek et al., 2014, Dunster et al., 2014, Erickson et al., 2019, Faeder et al., 2003, Fey et al., 2015, Harmon et al., 2017, Hlavacek et al., 2018, Kocieniewski et al., 2012, Kozer et al., 2013, Kühn and Hillmann, 2016, Lee et al., 2003, Mitra et al., 2018, Monine et al., 2010, Mukhopadhyay et al., 2013, Oguz et al., 2013, Romano et al., 2014, Shirin et al., 2019, Suderman and Deeds, 2013, Webb et al., 2011, Zheng et al., 2012). With four of these problems, we performed extensive benchmarking using different algorithms and different levels of parallelization. Not surprisingly, we find that the optimal algorithm depends on the fitting problem, which demonstrates the value of having a toolbox of several algorithms available. We then focus on a particularly challenging example problem: parameterizing the model of Tyson and co-workers for cell cycle control in yeast (Chen et al., 2000, Chen et al., 2004, Csikász-Nagy et al., 2006, Kraikivski et al., 2015, Oguz et al., 2013). This model was originally parameterized by hand-tuning (Chen et al., 2000, Chen et al., 2004) and later by automated optimization with problem-specific code (Mitra et al., 2018, Oguz et al., 2013). Here we consider our most recent description of the problem (Mitra et al., 2018), which has a 153-dimensional parameter space. We define the problem using BPSL and solve it using the general-purpose functionality of PyBioNetFit. Thus we demonstrate that PyBioNetFit can solve this general class of problem, that of using both qualitative and quantitative data to parameterize a biological model.
Finally, we considered a model describing drug intervention in autophagy signaling (Shirin et al., 2019) to demonstrate the capabilities of PyBioNetFit and BPSL beyond model parameterization. We show that BPSL can be used to define a set of system properties, which can then be used in model checking. We also demonstrate how BPSL can be used to configure a design problem, finding a combination of drug doses to achieve a desired level of autophagy regulation.
Results
Workflow Enabled by PyBioNetFit
The steps involved in using PyBioNetFit are illustrated in Figure 1. PyBioNetFit is configured with a set of plain-text input files (Figure 1A). The input files must have particular filename extensions: .conf, .bngl, .xml, .exp, and .prop. We will refer to these as CONF files, BNGL files, etc. The files may be prepared in any standard text editor.
Figure 1.
Inputs, Outputs, and Operations of PyBioNetFit
(A) PyBioNetFit input files are a set of plain-text files: a CONF file specifying program settings, one or more model files in BNGL and/or SBML format, and one or more data files containing experimental data. EXP files contain quantitative data, and PROP files contain qualitative data. Examples of these files are shown in Figure 2.
(B) When running PyBioNetFit, the user-selected optimization algorithm generates candidate parameter sets, which are passed to the appropriate simulator (for SBML models, libRoadRunner; for BNGL models, the simulator selected in the BNGL file). PyBioNetFit calculates the value of a user-selected objective function from the simulation results obtained for each trial parameter set, which is then used to inform future iterations of the algorithm. Each simulation and objective function evaluation is started as a separate worker process, which is run on a separate core of a multicore workstation or cluster if available.
(C) PyBioNetFit output files include a text file reporting the best-fit parameter values, model files with the best-fit parameter settings, and output files resulting from simulating the models using the best-fit parameter values.
Figure 2 shows an example set of PyBioNetFit input files for a simple problem. The problem is to parameterize a model for the chemical kinetics of three reactions (Figure 2A) using synthetic quantitative and qualitative data (Figure 2B).
Figure 2.
A Fitting Problem Configured to Run in PyBioNetFit
(A–C) (A) Reaction scheme of the model to be parameterized. The model is a coupled system of ODEs for the mass-action kinetics of the reactions shown here. The problem is to estimate values for the rate constants k1, k2, and k3. (B) Time courses of concentrations of species B and C. Black broken curves give the ground truth. For fitting, two quantitative data points (black points) and four qualitative data points (colored circles) are available. The qualitative data indicate whether the concentration of species B or C is larger at a particular time. A plus sign indicates B > C and a minus sign indicates B < C. Colored curves show the quality of fit after running PyBioNetFit on the input files listed in (C–F). (C) Implementation in BNGL of a model for the reaction scheme in (A). Note that the parameters to be tuned by PyBioNetFit are named with the suffix __FREE (lines 3–5). The parameter Ainit, which is to be held fixed, is not named with __FREE (line 6).
(D) EXP file containing the quantitative data points shown in (B). The keyword nan is used to indicate missing data.
(E) PROP file encoding the qualitative data points shown in (B).
(F) CONF file used to configure PyBioNetFit. As described in the main text, the CONF file specifies the paths to the other files, the algorithm to be used, and the free parameters to be adjusted. The files pictured here are available in Data S1 (Problem 5).
A model file (Figure 2C; filename extension .xml for SBML models or .bngl for BNGL models) defines the model to be fit. In the case of BNGL models, the model file also defines simulation protocols. BNGL allows for sophisticated simulation protocols such as equilibration to a basal steady state. For SBML models, simulation protocols must be defined in the CONF file (see below). BNGL files may be prepared with a standard text editor, or with the text editor available within RuleBender (Xu et al., 2011), an integrated development environment for BioNetGen. SBML files are not human readable and should be prepared using SBML-compatible software such as COPASI (Hoops et al., 2006) or Tellurium (Medley et al., 2018, Choi et al., 2018). Model files must conform to certain conventions for compatibility with PyBioNetFit. For BNGL files, each free parameter to be fit must be assigned a name ending in __FREE (Figure 2C lines 3–5) and each model output to be compared with measurements must be introduced as a BNGL observable or function. For SBML files, each free parameter must be an SBML parameter or the initial concentration of a species, and each model output to be compared with measurements must be an SBML species concentration or population. In addition, each simulation command must have an associated string identifier, called a suffix. If the simulation is defined in a BNGL file, the suffix is specified using the suffix argument of the simulate or param_scan action. If the simulation is defined in the CONF file (see description of the CONF file below), the suffix is specified as part of the time_course or parameter_scan declaration. The suffix must match the name of the corresponding experimental data file (e.g., “d1” in Figure 2). In the simplest case, a fitting job has one model file. However, PyBioNetFit supports jobs with multiple model files, such as, the problem considered in the section Application: Fitting a Model of Yeast Cell Cycle Control Using Both Qualitative and Quantitative Data. This feature is useful when two or more models have parameters in common, such as two models that represent the same process in wild-type and mutant cells.
Experimental measurements are supplied in EXP (Figure 2D) and PROP (Figure 2E) files. EXP files contain tabular quantitative data, such as time courses or dose-response curves. These files are specified in a space-delimited format in which the first column corresponds to the independent variable and other columns correspond to dependent variables (the same format as is used in GDAT files output by BioNetGen). PROP files contain statements written using BPSL, which is described in the next section. PROP files are used for qualitative data.
A configuration file, or CONF file (Figure 2F), provides the settings for running a fitting job. These settings include which model and data files to use (line 3), which parameters will be free to vary in fitting (lines 15–17), which fitting algorithm to use (line 9), which objective function to use (line 10), and settings specific to the selected fitting algorithm (lines 11–12), such as the mutation rate in the case of differential evolution. Additional configuration keys are available to define simulation protocols (time courses or parameter scans), and to declare free parameters that vary in logarithmic rather than linear space. A complete listing of the available configuration keys is provided in the PyBioNetFit user manual (Mitra and Suderman, 2019).
After generating all the required files, a user can run PyBioNetFit from the command line, as described in the user manual (Mitra and Suderman, 2019). Figure 1B illustrates the internal operations of PyBioNetFit. PyBioNetFit iteratively passes proposed parameter sets to the appropriate simulator, reads the simulation results, and calculates the value of the user-selected objective function. The objective function values are fed back into the optimization algorithm and affect which parameter sets are proposed in future iterations. Upon termination of the algorithm, PyBioNetFit outputs the best-fit parameter values, new model files that include those parameter settings, and (optionally) simulation results generated from those model files (Figure 1C).
Property Specification with BPSL
To allow fitting to qualitative data, we implemented the approach described by Mitra et al. (2018) in PyBioNetFit. For this feature, we developed BPSL, a novel property specification language. BPSL is designed for writing system properties of cellular regulatory networks. In BPSL, system properties are expressed as inequalities involving the dependent variables of an experiment or model. We refer to such dependent variables in this section as simply “variables.” Typically, BPSL statements are written for the purpose of parameterizing a particular model, in which case the names of the variables should match the names of outputs of that model, similar to column headings of an EXP file. However, we note that like EXP files, BPSL statements primarily encode (experimental) data, and the same data could be considered in conjunction with any model for the system of interest (possibly only after changing the variable names to match the output names of the new model). Variables in BPSL are flexible: in addition to what is supported in EXP files—quantities corresponding to BNGL or SBML model outputs—it is possible to compare variables/readouts between different models/systems. One application of this feature would be case-control comparisons, such as comparing a mutant to wild-type.
Each inequality declared in BPSL is enforced at a particular value or range of values of the independent experimental variable (e.g., time). For example, an inequality might be enforced at one specific time, or at all times in a time course. As described below, BPSL syntax provides a means to define inequalities, where they are enforced, and how much they contribute to the objective function during optimization.
A BPSL statement consists of three parts: an inequality, followed by an enforcement condition, followed by a weight. The inequality establishes a relationship (<, >, ≤, or ≥) between a variable and a constant or between two variables. The enforcement condition specifies where in a time course or dose-response curve the constraint is in effect. Enforcement conditions are defined using the keywords always, once, at, and between, as summarized in Table 1. The weight (declared with the weight keyword) specifies the static penalty coefficient to be used during optimization when the inequality is not satisfied. Specifically, if the constraint g(ŷ)<0 is not satisfied for the model outputs ŷ, the objective function adds a penalty equal to C⋅g(ŷ) where C is the weight of the constraint. Note that, in this formulation, the penalty decreases as we move closer to satisfying the constraint. This feature of the objective function serves to guide an optimization algorithm toward constraint satisfaction.
Table 1.
Keywords Used to Define Enforcement Conditions in BPSL
| Keyword | Meaning |
|---|---|
| always | At all times |
| once | At one or more time points |
| at〈condition〉 | At the first time point where 〈condition〉 is true |
| between 〈condition1〉, 〈condition2〉 | Over the range of time points starting with the first point where 〈condition1〉 is true and ending with the first subsequent time point where 〈condition2〉 is true |
Definitions assume that the independent variable is time, but any arbitrary independent variable may be considered, as when considering a steady-state dose-response curve instead of a time course.
We illustrate the use of BPSL with the following examples, assuming time course outputs X(t) and Y(t). The BPSL statement
| X > 5 always weight 2 |
defines a constraint requiring X(t) to be greater than 5 at all times. If the constraint is violated, a penalty of 2⋅(5−min(X(t))) is added to the objective function. The BPSL statement
| X < 1 between time = 8, Y = 5 weight 3 |
defines a constraint requiring X(t) to be less than 1 over a specified time range. The start point of this time range is specified directly: time = 8. The endpoint is specified indirectly based on the value of Y(t); it is the first time point after t = 8 where Y(t) = 5. More precisely, to avoid numerical error, PyBioNetFit checks when Y(t) crosses 5, i.e., finds, after t = 8, the first two consecutive output times t1 and t2 such that Y(t1)<5 ≤ Y(t2) or Y(t1)>5 ≥ Y(t2) and sets t2 as the endpoint. If the constraint is violated at any point in the above time range, the penalty is 3⋅(max(X(t))−1), where max(X(t)) is evaluated over the time range.
Metaheuristic Fitting Algorithms
PyBioNetFit features four recommended parallelized metaheuristic fitting algorithms, which we will refer to as differential evolution (DE), asynchronous differential evolution (aDE), particle swarm optimization (PSO), and scatter search (SS). The details of each algorithm's implementation and configuration options are provided in the PyBioNetFit user manual (Mitra and Suderman, 2019). Note that aDE and PSO are implemented as asynchronous algorithms, which address load-balancing issues by submitting a new simulation job whenever one is completed. Such an implementation prevents CPU cores from remaining idle, but requires new trial parameter sets to be proposed with limited new information. In contrast, our synchronous DE and SS algorithms require all simulations performed within an iteration to complete before moving on to the next iteration.
To demonstrate the breadth of problems that can be solved using PyBioNetFit, we ran these algorithms on a total of 31 example problems, listed in Table 2. The problems are described in the following references: 1, Hass et al., 2019, Zheng et al., 2012; 2, Blinov et al., 2006, Gupta and Mendes, 2018; 3, Faeder et al., 2003, Sneddon et al., 2011, Gupta and Mendes, 2018; 4, Kozer et al., 2013, Thomas et al., 2016; 5, none; 6, Harmon et al. (2017); 7, Hlavacek et al. (2018); 8, Laomettachit, 2011, Oguz et al., 2013, Mitra et al., 2018; 9, Shirin et al. (2019); 10, Kozer et al., 2013, Thomas et al., 2016; 11, Monine et al., 2010, Posner et al., 2007, Thomas et al., 2016; 12, Chylek et al., 2014, Thomas et al., 2016; 13, Thomas et al. (2016); 14, Thomas et al. (2016); 15, Erickson et al., 2019, Kiselyov et al., 2009; 16, Romano et al. (2014); 17, Blinov et al., 2006, Gupta and Mendes, 2018; 18, Kocieniewski et al. (2012); 19, Mitra et al. (2018); 20, Mitra et al. (2018); 21, Dunster et al. (2014); Xue and Del Bigio (2000); 22, Boehm et al., 2014, Hass et al., 2019; 23, Brännmark et al., 2010, Hass et al., 2019; 24, Fey et al. (2015); 25, Webb et al. (2011); 26, Mukhopadhyay et al., 2013, Manz et al., 2011; 27, Lee et al. (2003); 28, Suderman and Deeds, 2013, Yi et al., 2003, Yu et al., 2008, Leeuw et al., 1998; 29, none; 30, Kühn and Hillmann (2016); 31, Hlavacek et al. (2018). See also Table S1. Input files, descriptions, and results for each of these problems are provided in Data S1, a ZIP archive containing 31 numbered folders, one for each example problem. We will refer to the problems by these numbers. For example, we will refer to the folder associated with Problem 1 in Table 2 as Data S1 (Problem 1). In some cases, we fit models to published experimental data. In other cases where no appropriate experimental dataset was available, we generated synthetic data by simulating the model with an assumed ground truth parameter set. The synthetic data included noise; depending on the problem, this was added as Gaussian white noise, uniformly distributed noise, or noise inherent to performance of a single stochastic simulation. In total, the example problems included 19 rule-based models defined in BNGL, nine of which were fit to experimental data; nine manually formulated ODE models defined in SBML, 6 of which were fit to experimental data; and 3 problems using closed-form functions. All the problems could be solved with an acceptable fit (defined as reaching a target objective function value, which is specified in Data S1 for each individual problem) with at least one of the available algorithms using the default algorithmic parameters. Most could be solved with all four algorithms tested, albeit with different efficiencies. We do not perform a comprehensive analysis of every model (which would entail varying algorithmic parameters, performing additional replicates of fitting, etc.), but with the fitting runs we performed, we illustrate that PyBioNetFit can be used to analyze a variety of SBML- and BNGL-formatted models.
Table 2.
Summary of the 31 Example Problems Provided in Data S1
| # | Key Model component(s) | Data | Sim. | Pars. | Rxns. | Eqs. | Pts. | Sims. | Algs. |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Histones | E | RR | 46 | 60 | 30 | 48 | 1 | D,A,P,S |
| 2 | EGFR, Grb2, Sos | S | B-ode | 37 | 3,749 | 356 | 40 | 1 | D,A,P,S |
| 3 | IgE receptor | S | B-ssa | 20 | 58,276 | 3,744 | 66 | 3 | D,A,P,S |
| 4 | EGFR | E | B-nf | 9 | – | – | 24 | 12 | D,A,P,S |
| 5 | Simple reactions | S | B-ode | 3 | 3 | 3 | 6* | 1 | D,A,P,S |
| 6 | Degranulation | E | B-ode | 16 | 86 | 23 | 6 | 11 | D,A,S |
| 7 | Egg-shaped curve | S | B-ode | 10 | 1 | 2 | 362 | 1 | D,A,P,S |
| 8 | Yeast cell cycle regulators | E | RR | 153 | 39 | 26 | 2352* | 122 | S |
| 9 | mTORC, ULK1, AMPK | D | RR | 6 | 5 | 5 | 2* | 1 | S |
| 10 | EGFR | E | B-ode | 9 | 11,918 | 923 | 24 | 6 | D,A,P,S |
| 11 | Trivalent ligand | E | B-nf | 3 | – | – | 12 | 36 | D,A,P,S |
| 12 | TCR | E | B-nf | 34 | – | – | 68 | 1 | D,A,P,S |
| 13 | Ligand/receptor | S | B-ode | 6 | 54 | 15 | 26 | 1 | D,A,P,S |
| 14 | Ligand/receptor | S | B-nf | 6 | – | – | 26 | 2 | D,A,P,S |
| 15 | IGF1R | E | B-ode | 7 | 96 | 27 | 38 | 38 | D,A,P,S |
| 16 | Raf, MST, ERK | S | RR | 63 | 31 | 21 | 60 | 1 | D,A,P,S |
| 17 | EGFR, Grb2, Sos | S | B-ssa | 37 | 3,749 | 356 | 40 | 3 | D,A |
| 18 | MAPK | S | B-ode | 13 | 487 | 85 | 28 | 7 | D,A,P,S |
| 19 | Raf inhibitor | S | B-ode | 2 | 12 | 6 | 28* | 13 | D,A,P,S |
| 20 | Raf inhibitor | S | B-ode | 4 | 12 | 6 | 28* | 13 | D,A,P,S |
| 21 | Immune cells | E | RR | 7 | 10 | 7 | 21 | 1 | D,A,P,S |
| 22 | STAT | E | RR | 6 | 9 | 8 | 48 | 1 | D,A,S |
| 23 | Insulin receptor | E | RR | 22 | 11 | 9 | 43 | 9 | D,S |
| 24 | Jnk | E | B-ode | 12 | 330 | 66 | 59 | 22 | D,A,P,S |
| 25 | Cells expressing Fas or FasL | S | RR | 11 | 17 | 7 | 64 | 16 | D,A,P,S |
| 26 | TCR | E | B-nf | 10 | – | – | 9 | 450 | P,S |
| 27 | Wnt, Axin, APC | S | RR | 25 | 17 | 15 | 68 | 1 | D,A,P |
| 28 | MAPK | E | B-nf | 25 | – | – | 96 | 2 | A |
| 29 | Schwefel function | S | RR | 2 | 0 | 1 | 1 | 1 | D,P,S |
| 30 | Job market | S | B-nf | 6 | – | – | 330 | 3 | D,A,P,S |
| 31 | Elephant-shaped curve | S | B-ode | 82 | 1 | 2 | 930 | 1 | D,A,P,S |
Table columns are summarized as follows. “Key model component(s)” lists some components of the model (but is not intended as a complete description of the model). “Data” gives the type of data used in fitting: E, experimental; S, synthetic; D, specification of desired system properties for a design problem. “Sim.” gives the simulator used: RR, libRoadRunner, B-ode, BioNetGen ODE; B-ssa, BioNetGen SSA; B-nf, NFsim. Note that models using libRoadRunner are implemented in SBML and models using the other three simulators are implemented in BNGL. “Pars.” gives the number of free parameters. “Rxns.” gives the number of chemical reactions in the model. “Eqs.” gives the number of differential equations in the model. Reaction and equation counts are not given for models simulated with NFsim because the simulation is run without enumerating all reactions and equations. “Pts.” gives the number of data points. When indicated (*), this total includes qualitative data points (i.e., inequality constraints). “Sims.” gives the number of individual time course simulations required for one evaluation of the objective function. “Algs.” lists the algorithms used to solve the problem: D, DE; A, aDE; P, PSO; S, SS.
To demonstrate additional specific features of PyBioNetFit, which were not feasible or applicable to run on all example problems, we selected specific problems from Table 2 to use for illustration, as indicated in Table 3. We will describe results for these illustrative problems in the sections that follow.
Table 3.
Example Problems Selected for Demonstrations of Additional Features of PyBioNetFit, Presented Throughout the Results Section
| Problem | Feature |
|---|---|
| 1–4 | Timed benchmarking of performance |
| 5 | Demonstration of configuration |
| 6 | Bayesian uncertainty quantification |
| 7 | Bootstrapping |
| 8 | Real-world problem using qualitative data |
| 9 | Model checking and design |
To evaluate which algorithms are most effective in typical use cases, we performed timed benchmarking. We used the default algorithmic parameters for each algorithm. Because of the stochastic nature of the algorithms, many replicates of the same fitting job were necessary to make conclusions about the typical run time of each algorithm. As benchmark problems, we chose Problems 1–4 in Table 2, which have fitting run times on the order of hours. Such problems are not trivial, but it is still feasible to run many fitting replicates on a cluster.
To examine the full scope of PyBioNetFit functionality, our selected benchmarks include one problem using each of the four key simulators supported in PyBioNetFit, which we refer to as libRoadRunner, BioNetGen ODE, BioNetGen SSA, and NFsim. These simulators are described briefly as follows. (1) libRoadRunner is an SBML simulator. By default, libRoadRunner interfaces with CVODE (Hindmarsh et al., 2005) to perform numerical integration. (2) BioNetGen ODE refers to the numerical integration capability of BioNetGen, accessed with the action simulate(method=>"ode"). Like libRoadRunner, this functionality interfaces with CVODE (Hindmarsh et al., 2005). (3) BioNetGen SSA refers to an efficient variation of Gillespie's stochastic simulation algorithm (Gillespie, 2006) implemented in BioNetGen, accessed with the action simulate(method=>"ssa"). (4) NFsim refers to the component of BioNetGen accessed with the action simulate(method=>"nf") that performs agent-based stochastic simulations without generation of a reaction network (Sneddon et al., 2011). For the stochastic simulators BioNetGen SSA and NFsim, PyBioNetFit performs smoothing by averaging a user-specified number of replicate runs before comparing the results to experimental data.
For each of the benchmark problems, we chose a target objective function value (described for each problem in Data S1) and measured the run times required for each algorithm to reach the target value. We evaluated the run times of the four algorithms and also measured how the run times scaled with an increasing number of available cores on a cluster. As described in Transparent Methods, we adjusted the population size of each algorithm based on the core count, such that each iteration used all available cores. The resulting distributions of run times are shown in Figure 3. We found that in most cases, the algorithms show good capacity for taking advantage of parallelization, in that the median run time decreases as the number of available cores increases. The best algorithm varies by problem, and also varies by the number of cores available. Notably, with a large number of available cores (288), PSO (an asynchronous algorithm) is most effective for the benchmarks using stochastic simulators (BioNetGen SSA and NFsim) (Figures 3C and 3D). According to the Mann-Whitney U statistical test, for Problem 3, PSO is faster than aDE with p = 3.3 × 10−5 and faster than SS with p = 7.7 × 10−3. For Problem 4, PSO is faster than SS with p = 1.4 × 10−4. However, PSO is outperformed by aDE and SS for the other benchmarks (Figures 3A and 3B). For Problems 1 and 2, DE and aDE encountered convergence failures with small core counts because the corresponding population sizes were too small to effectively explore the parameter space.
Figure 3.
Results from Timed Benchmarking of PyBioNetFit
(A–D) Run times required to reach a target objective value are shown for our four selected benchmark problems (Table 2, Problems 1–4), for the DE, aDE, PSO, and SS algorithms implemented in PyBioNetFit. Box plots indicate the distribution of 20 replicates in (A–C) and 12 replicates in (D). Gray points represent results from individual replicates. Replicates that ran for the maximum wall time (3 h in A, 6 h in B–D) without reaching the target value are plotted in the “Wall” band. When calculating percentiles for box plots, “Wall” replicates were taken to be larger than any successful replicate. “CF” (convergence failure) gives the number of replicates, out of 20 total, that failed because the population converged to a single point that was worse than the target value. Box plot statistics do not include convergence failures. Box plots are not shown for settings in which more than half the replicates were convergence failures. Each pair of whiskers indicates the minimum and maximum. Each box indicates the quartiles. Each horizontal line indicates the median.
Variability in algorithm performance is expected when considering a broad range of problems. In the end, the best algorithm and level of parallelization are problem specific, and must be selected through trial and error. PyBioNetFit helps users in this regard by providing robust implementations of several algorithms, allowing for easy testing of different approaches.
Two additional metaheuristic algorithms are implemented in PyBioNetFit but not rigorously benchmarked: simulated annealing (SA) and the parallelized island-based differential evolution (iDE) algorithm of Penas et al. (2015). These algorithms were challenging to include in benchmarking because of the need to tune problem-specific parameters (temperature and step size in the case of SA and trade-offs between island size and number of islands versus the number of available cores in the case of iDE). Still, we include the implementations in PyBioNetFit with the hope that users will find them useful for specific problems.
Local Optimization
PyBioNetFit includes a parallelized implementation (Lee and Wiswall, 2007) of the simplex algorithm (Nelder and Mead, 1965), a gradient-free local search algorithm. The simplex algorithm may be used on its own or to refine the best fit obtained from any of the other algorithms. For rugged parameter landscapes, which we expect to be common for problems considered in PyBioNetFit, a gradient-free local search algorithm is unlikely to find the global minimum on its own. Therefore, our recommended use of the simplex algorithm is for refinement of an existing best fit.
Comparison to a Gradient-Based Optimization Method
Although we did not rigorously benchmark PyBioNetFit against other parameterization tools, we tested the gradient-based method of Data2Dynamics (Raue et al., 2015) on Problems 1 and 2 (Table 2) to obtain a rough view of how the performance of this tool compares to that of PyBioNetFit. Results are provided in Data S1 (Problems 1 and 2) and considered further in Discussion. We note that the results are provided only with the intention of concretely demonstrating discussion points.
Uncertainty Quantification
For Bayesian uncertainty quantification, PyBioNetFit offers two MCMC methods: the conventional MH algorithm and parallel tempering (PT). These methods are used by setting fit_type = mh or fit_type = pt in the CONF file, similar to how de is selected in Figure 2F, line 9. To validate the accuracy of PyBioNetFit, we used MH and PT with the model of mast cell signaling described by Harmon et al. (2017) (Table 2, Problem 6). Harmon et al. (2017) observed differences in mast cell degranulation as a function of the time delay between two pulses of antigen stimulation of IgE receptor activity. The model describes the activities of Syk and Ship1 during this two-stage stimulation protocol. The original study included Bayesian uncertainty quantification of model parameters and predictions using problem-specific code that implemented MH. We ran MH and PT in PyBioNetFit using input files provided in Data S1 (Problem 6). We found that both PyBioNetFit algorithms achieved good agreement with the published results for parameter uncertainty (Figures 4A–4F) and prediction uncertainty (Figures 4G–4L). For this problem, the MH and PT algorithms converged to the correct distribution at roughly the same rate. Convergence was checked by dividing the samples into two independent sets (sampled by different Markov chains) and confirming by inspection that the two sets of samples had similar distributions.
Figure 4.
Bayesian Uncertainty Quantification in PyBioNetFit
(A–L) Results from PyBioNetFit's MH (B, E, H, and K) and PT (C, F, I, and L) algorithms are compared with the problem-specific code of Harmon et al. (2017) (A, D, G, and J). The data plotted in (A, D, G, and J) originally appeared in Harmon et al. (2017). (A–F) Marginal posterior probability distributions for selected parameters of the model of Harmon et al. (2017). Two examples of the 16 model parameters are shown. (G–L) Prediction uncertainty quantification for time courses of activated Ship1, one of the model outputs. Two antigen stimulation protocols are shown: one in (G–I) and the other in (J–L). Black bars above graphs indicate times when multivalent antigen was present. Solid curves indicate the median, and shaded areas indicate the 68% credible interval.
Note that to calculate the posterior probability distribution, the objective function is assumed to correspond to a negative log likelihood. This assumption is valid for the chi-square objective function, which was used in this example. Bayesian MCMC algorithms will not produce statistically meaningful results if used with PyBioNetFit's other available objective functions or when penalty terms arising from qualitative data are added to the objective function.
When using Bayesian MCMC methods, it is important to choose algorithmic parameters such that the posterior distribution is sampled accurately. In particular, some number of unsampled “burn-in” iterations should be used to allow the Markov chains to reach a starting point in a region of high probability density. In addition, an adequately large number of iterations must be sampled for the Markov chains to fully explore the posterior distribution. The Gelman-Rubin statistic (Gelman and Rubin, 1992) is a popular quantitative test for convergence of sampling. Exploring the target distribution may be especially challenging when the distribution is multimodal, and it is a rare event for a Markov chain to move between modes. In these situations, PT is expected to outperform MH by providing a faster means of escape.
Run times of Bayesian MCMC algorithms are expected to be dominated by the run times of the large number of simulations required to adequately sample probability distributions. We therefore do not expect a noticeable difference in performance between different implementations of the same MCMC algorithm run with the same settings, aside from differences in simulator efficiency. PyBioNetFit is a convenient tool for running MCMC because it supports both BNGL and SBML models without the need for custom code. In addition, MCMC in PyBioNetFit takes advantage of parallelization. In MH, individual Markov chains are not parallelizable, but PyBioNetFit can run multiple independent Markov chains in parallel and pool the results to create a larger sample of a probability distribution. In PT, the algorithm requires the simultaneous propagation of several Markov chains, and these chains are run in parallel. Efficiency of MH and PT is known to decline for high-dimensional parameter spaces. Problem 7 has a 16-dimensional parameter space, which is the largest for which we have used these methods.
MH and PT are widely used algorithms, but they are not suitable for every problem. More sophisticated MCMC algorithms described elsewhere include (among many others) Hamiltonian Monte Carlo (Betancourt, 2017), a gradient-based method implemented in other tools such as the statistical software package Stan (Carpenter et al., 2017), and the differential evolution Markov Chain family of algorithms (ter Braak and Vrugt, 2008). These more advanced algorithms are not included in the initial PyBioNetFit release, but the extensibility of PyBioNetFit (described in the Section Continued Development of PyBioNetFit) may allow them to be added in future development.
PyBioNetFit offers bootstrapping (Efron and Tibshirani, 1993, Press et al., 2007) as another uncertainty quantification method. Bootstrapping relies on the assumption that the experimental data points are drawn from some (unknown) probability distribution and that drawing a sample from the data available is a good approximation of drawing a sample from the distribution.
Results of bootstrapping are typically reported as a “confidence interval” for the value of each parameter. Three important caveats must be kept in mind when interpreting this confidence interval. First, the interval refers specifically to the location of the best-fit parameter set. For this reason, bootstrapped intervals tend to be narrower than those obtained from a likelihood-based Bayesian approach (Fröhlich et al., 2014). In addition, if a parameter is unidentifiable, bootstrapping can yield a misleadingly narrow interval. Second, a bootstrapped confidence interval includes both uncertainty arising from experimental data and uncertainty introduced by imperfect performance of the fitting algorithm used (unless the algorithm has perfect performance with respect to finding the global minimum). Thus, when we obtain a “90% confidence interval” from bootstrapping, it means that if the experiment was repeated, and the fitting was repeated using the new data, then the best-fit parameter is expected to fall within the interval with 90% confidence. Third, bootstrapping relies on the assumption that a resampled dataset is a good approximation of repeating an experiment. This assumption may not be valid when the size of the original experimental dataset is small. Interested readers can find further discussion of the advantages and limitations of bootstrapping in Chernick and LaBudde (2011).
To illustrate how bootstrapping can be used to measure uncertainty arising from different fitting algorithms, we consider a fitting problem consisting of an egg-shaped curve (Table 2, Problem 7), originally presented by Hlavacek et al. (2018). This toy problem is simple enough for PyBioNetFit's SS algorithm to find the global minimum, but the BioNetFit 1 algorithm is less effective. We performed bootstrapping on this problem with PyBioNetFit and found that the best fit for each parameter was identified to precision of order 10−4 with 90% confidence (Data S1, Problem 7). This high level of precision is unsurprising, given that the input data consist of densely sampled points on the target curve with minimal noise. In contrast, 90% confidence intervals reported using BioNetFit 1 span large ranges, of order 1 in some cases (Hlavacek et al., 2018). We conclude that the uncertainty reported with BioNetFit 1 arises mainly from limitations of the fitting algorithm, rather than from limitations in the amount or quality of data for fitting. Again, results from bootstrapping would be independent of the optimizer if the optimizer is always able to find a unique global minimum, but this is not a realistic expectation for many problems.
In summary, the uncertainty quantification methods in PyBioNetFit provide different and complementary functionalities. The Bayesian MH and PT algorithms estimate a multidimensional probability distribution showing the most probable parameter values (treated as random variables) based on the data. Different Bayesian algorithms with the same input data are expected to produce the same results, as long as algorithmic settings allow for sufficient sampling of the posterior probability distribution. Bootstrapping evaluates the uncertainty given a fitting algorithm in combination with a particular dataset. The resulting bootstrap confidence interval represents the confidence in the best-fit parameter values if both the experiment and the fitting were to be repeated.
Application: Fitting a Model of Yeast Cell Cycle Control Using Both Qualitative and Quantitative Data
To demonstrate the capabilities of PyBioNetFit to parameterize models using both qualitative and quantitative data, we used PyBioNetFit to re-solve a challenging, published parameterization problem involving a model of yeast cell cycle control developed by Tyson and co-workers (Chen et al., 2000, Chen et al., 2004, Csikász-Nagy et al., 2006, Kraikivski et al., 2015, Oguz et al., 2013). Early versions of this model were parameterized by hand-tuning (Chen et al., 2000, Chen et al., 2004), and later by problem-specific code with a search space informed by previous hand-tuned results (Oguz et al., 2013). In our previous work, we used problem-specific code to parameterize the model ab initio (Mitra et al., 2018). Our problem formulation used the model described by Oguz et al. (2013) and Laomettachit (2011), incorporating the qualitative data tabulated by Laomettachit et al. (2016) and the quantitative data of Spellman et al. (1998) (Table 2, Problem 8). Our goal in this work was to use PyBioNetFit to obtain a similar quality of fit to previous work. Because this example serves primarily as an illustration of PyBioNetFit functionality, we configured the problem to be identical to the previous study (Mitra et al., 2018) in terms of models, datasets, and objective function.
PyBioNetFit contains all the features needed to repeat the fitting job of Mitra et al. (2018). The input files to run the fitting job are provided as Data S1 (Problem 8). Like in the original study, we performed optimization using scatter search, as described in Transparent Methods.
We ran the fitting job in PyBioNetFit, and the resulting fit was of similar quality to that of previous work. We present a subset of the results in Figure 5 and the parameterized model in Data S1 (Problem 8). Our reported fit is the best result from 40 independent replicates. Convergence plots for all 40 replicates are shown in Figure S1. We achieved a minimum objective function value of 80, compared with 70 in Mitra et al. (2018). A difference is not surprising given the stochastic nature of the SS algorithm (or any metaheuristic). For comparison, our best objective function value from a starting sample of 500 random parameter sets was 5,493. Our fit is not identical to the previously published fit, which is expected because some model parameters were shown not to be identifiable (Mitra et al., 2018). However, like the published fit, the fit generated by PyBioNetFit shows reasonable consistency with the qualitative data (Figures 5A–5F). In five of the six example panels shown, the parameterized model is consistent with the constraint indicated by the horizontal lines (as described in the figure caption). In one panel (Figure 5E), the time course is inconsistent with the constraint, illustrating that although most constraints are satisfied by our best fit, not all are satisfied. The fit found by PyBioNetFit also captures certain features of the quantitative data (Figures 5G–L), such as, for example, the location of the peaks in (G) and (H). A more rigorous analysis of the misfit to quantitative data would require information about experimental measurement error, which was not available for this dataset.
Figure 5.
Example Outputs of the Model for Yeast Cell Cycle Control Parameterized with PyBioNetFit
(A–F) Selected output showing agreement with qualitative data. Two output variables are shown: V (A–C), representing cell volume, and ORI (D–F), a flag that indicates origin activation is completed when its value reaches 1. Results for three selected yeast strains are shown: wild-type (A and D), which is viable; a mutant (cln3Δ bck2Δ) (B and E), which has a G1 arrest phenotype; and another mutant (cdc14-ts) (C and F), which has a telophase arrest phenotype. Horizontal lines indicate qualitative constraints: time courses should exceed black dash-dot lines and should not exceed red dashed lines.
(G–L) Selected output showing agreement with quantitative data of Spellman et al. (1998) (red diamonds). These plots were shown in Mitra et al. (2018) with the best-fit results obtained in that study. Gene expression levels are shown for CLB2T (G), CLN2 (H), CKIT (I), CDC20T (J), PDS1T (K),and CLB5T (L).
See also Figure S1.
Applications beyond Fitting: Model Checking
Although PyBioNetFit was designed for model parameterization, the property specification language of PyBioNetFit has additional applications in the analysis of parameterized models, namely, model checking and design. To demonstrate these applications, we consider the model of Shirin et al. (2019) (Figure 6A). The model describes the interactions between four kinases involved in the regulation of autophagy, a cellular recycling process. The model also describes the effects of six types of drugs in modulating these interactions and the level of autophagy. In the original study, this model was used to investigate the capabilities of the six drugs (labeled D1 through D6) to control the number of autophagic vesicles (AVs) in a cell. For our analysis, we assume that the published parameterization of the model, which was shown to be consistent with certain experimental data in the original study, is acceptable.
Figure 6.
Applications of PyBioNetFit in Analysis of a Parameterized Model
(A) Schematic of the model to be analyzed, adapted from Shirin et al. (2019). Six drugs labeled D1 through D6 are capable of modulating various processes in the network shown. Interactions among kinases considered in the model are numbered 1–8.
(B) Model checking performed by PyBioNetFit of hypothetical alternatives to the model shown in (A). Each alternative model version (numbered 1–8) was obtained by removing one interaction from (A), corresponding to the version number. Each model version was checked against 10 qualitative behaviors characterized by Shirin et al. (2019): the change (increase or decrease) in AV count in response to a particular drug at a particular stress level, high (++) or medium (+).
(C) Optimizing drug dosing to achieve a desired system behavior. This table shows the minimal constant drug concentration to reduce AV count to 20 per cell or below, in a cell under high stress. Gray rows show optimized doses for each drug individually. The blue row shows the optimized dose by simultaneously tuning all six drug concentrations. Although all six drug doses were allowed to vary, the optimal solution had only two drugs with nonzero dose.
(D) Time course of AV counts under the treatments shown in (C). The gray broken line shows the response to treatment with D3 only, the gray dash-dot line shows the response to treatment with D4 only, and the blue solid line shows the response to the optimized six-drug dose. Responses to D2 or D6 only are indistinguishable from the response to the optimized dose (but require more total drug than with the optimized drug combination).
Model checking, as defined here, consists of evaluating whether a particular model satisfies a set of specified properties. To illustrate model checking, we considered eight hypothetical alternatives to the model of Shirin et al. (2019), each obtained by removing one of the labeled interactions from the network illustrated in Figure 6A. These changes are arbitrary for demonstration of the model checking workflow, but represent a scenario that could arise in practice: often, many models of the same biological process are developed by different research groups for different purposes, and a particular interaction might be present in one model but absent in another. In such a scenario, it is reasonable to ask whether the interaction is important to the model's ability to reproduce certain system properties. As system properties to be checked in our demonstration, we use the characterization of the system's response to drug treatment by Shirin et al. (2019), which we take to be the established truth. Specifically, for the six drug treatments allowed in the model and two levels of cellular stress (determined by the energy and nutrient parameters of the model, CEn and CNu), Shirin et al. (2019) characterized the change in the number of AVs relative to control. Ten of these 12 model settings resulted in an increase or decrease in AV count. For our model checking exercise, we determined whether each of our hypothetical alternative models is able to reproduce these 10 qualitative behaviors.
To perform model checking in PyBioNetFit, we must express each property of interest in BPSL. For this model, properties can be written as inequalities between the AV count for the untreated case and AV counts for the drug-treated cases. In PyBioNetFit, this type of case-control comparison is configured by performing simulations corresponding to multiple versions of the model—here, one version for each stress/drug combination considered plus one version at each stress level with no drug. As described in the section Workflow Enabled by PyBioNetFit, PyBioNetFit requires each of these simulations to have a unique suffix (a string defined in the BNGL or CONF file). These suffixes can be used in the PROP file to refer to the outputs of specific simulations. For example, suppose that the simulation of wild-type has the suffix data, the simulation with drug D2 has suffix data_D2, and after the system has equilibrated, the AV count should be lower in the presence of D2. We assume that the system is equilibrated in the time window of 120–240 min. Then the constraint would be written as
| data.AV > data_D2.AV between 120,240 |
The full implementation of the model checking problem in PyBioNetFit is provided as Data S1 (Problem 9).
The results of model checking are shown in Figure 6B. Four of the variant models (versions 1–4) remain consistent with all 10 system properties, whereas the other four (versions 5–8) no longer satisfy one or more of the properties. In the context of this model, these results suggest that interactions 1–4 in Figure 6A are not essential to the qualitative properties that we considered. More generally, this example demonstrates the ability of PyBioNetFit's model checking utility to help distinguish between models.
Applications beyond Fitting: Design
In a design problem, we seek perturbations of a system to achieve a set of desired properties defined in BPSL. To illustrate a design problem in PyBioNetFit, we consider a problem similar to the original study of Shirin et al. (2019). Namely, we want to choose the concentrations of drugs D1 through D6 so as to drive the AV count below a desired threshold, while minimizing the total quantity of drug used. We arbitrarily choose a threshold of 20 AVs and set CEn = CNu = 0.1 (on a scale of 0–1), corresponding to a high level of cellular stress. In the original study, arbitrary time courses of drug dosing were permitted and simultaneous dosing of up to two drugs at a time (out of the six drugs in the model) was considered. Here, we solve a different problem in which we limit ourselves to constant drug concentrations, but allow for simultaneous dosing of up to six drugs.
We configure this problem as a fitting problem in PyBioNetFit, in which the free parameters to be estimated represent the unknown concentrations of each of the six drugs. The desired system property of reducing AV count below 20 is implemented as an inequality constraint, and the goal to minimize drug concentration is implemented as a quantitative data point (i.e., minimizing the difference between the actual total drug dose and 0). The full configuration of this problem is provided as Data S1 (Problem 9).
The optimization results are shown in Figure 6C (bottom row). For comparison, we also performed optimizations in which only one of the drug concentrations was allowed to vary (Figure 6C). The results are consistent with those reported by Shirin et al. (2019). Note that the optimized combined dose allowing all six drugs uses less total drug than any of the single-drug doses. The optimized dosing schemes for both single-drug and combination treatments achieve the desired property of driving AV count below 20 (Figure 6D).
This example illustrates an additional, important class of problems that can be addressed with PyBioNetFit: the design of perturbations to a biological system to achieve specified behavior. More specifically, the example illustrates optimization of targeted drug treatments, which has been a long-standing goal in systems biology (Fitzgerald et al., 2006). The automated design of perturbations, with formal definition of target behavior, is systematic and less likely to miss effective perturbations than an ad hoc approach to model analysis.
Discussion
Comparison to Related Tools
Some features of PyBioNetFit are unique and novel, whereas other features have some overlap with other available optimization tools. Here we analyze and discuss the strengths and weaknesses of PyBioNetFit when compared with other published tools.
As PyBioNetFit was designed for parameterization of rule-based models written in BNGL, our primary comparison is with PyBioNetFit's predecessor, BioNetFit 1 (Thomas et al., 2016), which was previously state of the art for this application domain. In particular, no other tools to our knowledge support parameterization of models simulated with BioNetGen's stochastic algorithms (BioNetGen SSA, and NFsim). PyBioNetFit makes major improvements over BioNetFit 1 in terms of new functionality, as well as improved implementation of BioNetFit 1 functionality.
In our experience, PyBioNetFit far outperforms BioNetFit 1. As one example comparison, we ran Problem 2 in BioNetFit 1 with population size 144 (Data S1, Problem 2). We considered using the cluster-computing capabilities of BioNetFit 1 but found that the fitting ran faster on a single node (due to overhead in communicating with the cluster manager). BioNetFit 1 was unable to reach the target objective value within 10 h in any of five fitting replicates. For comparison, the fastest PyBioNetFit algorithm at a parallel count of 144 on a cluster had a median run time of 1.9 h (Figure 3B), which is a significantly better performance by the Mann-Whitney U test (p = 1.1 × 10−3).
A larger set of software is available for parameterization of ODE models defined in SBML. For smaller ODE models, we recommend gradient-based methods implemented in other tools as a starting point, as these algorithms tend to be more efficient than metaheuristics for problems where they are feasible (Raue et al., 2013). Data2Dynamics (Raue et al., 2015) uses forward sensitivity analysis (Leis and Kramer, 1988) to calculate the gradient of the objective function. Its default optimizer, which the developers recommend for most applications (Raue et al., 2013), is MATLAB's lsqnonlin function (which implements a trust region-reflective algorithm, MathWorks, 2018). Gradient-based methods are also supported in COPASI (Hoops et al., 2006), which calculates gradients by the finite difference approximation. We chose not to include gradient-based methods in PyBioNetFit at this time because existing tools already provide acceptable solutions.
In rugged parameter landscapes, gradient-based methods are susceptible to becoming trapped in local minima and slowed near saddle points. This issue can be addressed by performing multiple optimization runs at different start points but can become limiting if the parameter space has too many local minima. Metaheuristic algorithms can also become trapped in local minima, but experience suggests that they are more capable of escape than gradient-based methods. Various factors likely contribute to this capability, including uphill moves, random behavior, and exchange of information between multiple searchers.
We expect forward sensitivity analysis to perform well for ODE problems on the typical scale of the problems benchmarked by Hass et al. (2019). This method has been shown to scale roughly linearly with respect to the number of free parameters (Kapfer et al., 2019). The cost also depends on the number of equations. Scaling with respect to number of equations is of particular interest for rule-derived ODE models because such models often result in many more equations than typically arise in manually formulated ODE models. Even fairly simple rule-based models (in terms of number of parameters and rules defined) can imply hundreds to thousands of differential equations.
To illustrate the scaling behavior (with respect to number of ODEs) of forward sensitivity analysis as implemented in Data2Dynamics, we measured the run time of optimization on the ODE models of Problems 1 and 2, which were also used to benchmark PyBioNetFit. We note that Data2Dynamics can run multiple independent optimization runs in parallel to improve the probability of finding a good solution, but, in contrast to the parallelization of metaheuristic algorithms, this parallelization cannot improve the wall time of an individual run. On Problem 1 (a conventional ODE model with 30 equations and 46 parameters), Data2Dynamics completed optimization in 4 min, compared with median run times ranging from 11 to 14 min on 288 cores for the four metaheuristic algorithms of PyBioNetFit. Multiple runs of Data2Dynamics on this problem suggested that there is not large variability in run times between runs. On Problem 2 (a rule-derived ODE model with 356 equations and 37 parameters), Data2Dynamics required 8 h to complete one optimization run, compared with median run times ranging from 1.5 to 3.6 h on 288 cores for the four algorithms of PyBioNetFit. In this case, Data2Dynamics used a significant amount of run time simply for setup of the forward sensitivity equations. Of course, one cannot draw broad conclusions based on the results of only two problems. However, these results are consistent with what we would expect given that integration of ODE systems with many equations is costly, and forward sensitivity analysis requires more expensive integration (to calculate sensitivities with respect to each parameter) than is needed for simple objective function evaluation. The illustrated behavior is also what we would expect for gradient-based optimization in COPASI.
Recent work has demonstrated that adjoint sensitivity analysis can be effective for gradient computation for larger ODE models when forward sensitivity analysis is inefficient (Fröhlich et al., 2018). This approach has been used to solve a parameterization problem with 1,200 equations and 4,100 parameters (Fröhlich et al., 2018), which is a larger scale than we have considered with PyBioNetFit. To the best of our knowledge, adjoint methods have yet to be demonstrated for rule-derived ODE systems (or any system with many more equations than free parameters), but the good scaling properties of adjoint methods suggest such an approach would be feasible. The package AMICI (Fröhlich et al., 2017) supports adjoint sensitivity analysis for biological applications but offers only limited workflows. (Adjoint sensitivity analysis is also available in some general-purpose ODE solvers, Rackauckas et al., 2018.) AMICI is designed for use with time-series data with a known initial condition. Model parameterization can be performed by writing code to use AMICI in combination with the optimization toolbox PESTO (Stapor et al., 2018). We recommend that AMICI/PESTO be used for parameterizing large ODE models (hundreds of differential equations or larger) if the available workflows support the problem of interest.
A unique feature of PyBioNetFit is its support for a domain-specific property specification language (BPSL). To our best knowledge, no other biological modeling tool has a comparable functionality for specification of qualitative properties. Previous work on biological property specification (Clarke et al., 2008, David et al., 2012, Heath et al., 2008, Hussain et al., 2015, Khalid and Jha, 2018, Kwiatkowska et al., 2008, Liu and Faeder, 2016) relied on bespoke software, whereas BPSL can be used with the general-purpose functionality of PyBioNetFit. BPSL is also designed to be more human readable than conventional linear temporal logic (LTL), for instance. We expect a BPSL statement (but not an LTL expression) to be understandable to anyone with a background in biological modeling. For example, consider the following BPSL statement:
| A<1 between B = 2, B = 3 |
This statement is equivalent to the following LTL expression:
| F(B = 2) ⇒((¬(B = 2))U(B = 2∧(A < 1WB = 3))) |
where F is the “future” operator, U is the “until” operator, and W is the “weak until” operator. A drawback of BPSL relative to LTL is that the available enforcement keywords (Table 1) enable only a subset of what is possible with LTL. However, the current BPSL grammar is sufficient to support all constraints formulated in Mitra et al. (2018) to fit the yeast cell cycle model of Oguz et al. (2013) and Laomettachit (2011). PyBioNetFit was written with extensibility in mind, such that it is possible to add to the BPSL grammar as needs arise in other modeling problems. We also note that although PyBioNetFit is the first tool to support BPSL, information represented in BPSL need not be tied to one model or software tool. In the future, it will be possible for us or others to develop additional tools compatible with BPSL.
Figure 7 summarizes the niche filled by PyBioNetFit in relation to other software supporting complete fitting workflows for biological models. PyBioNetFit is unique in its support for qualitative data (including model checking and design applications) and for its built-in, well-engineered support for cluster computing. PyBioNetFit is also notable for its multiple algorithm options that provide algorithm-level parallelization. BioNetFit 1 provides only one such algorithm, and other tools support only parallelization of independent runs. PyBioNetFit has the largest overlap in functionality with its predecessor BioNetFit 1, but as described above, PyBioNetFit far outperforms BioNetFit 1 in head-to-head comparisons. PyBioNetFit is recommended over BioNetFit 1 for all overlapping features, including parameterization of stochastic models. Data2Dynamics and COPASI tend to have use cases distinct from PyBioNetFit, such as for ODE problems that benefit from gradient-based optimization using forward sensitivity analysis or the finite difference approximation.
Figure 7.
Venn Diagram Comparing the Functionality Provided in PyBioNetFit with that of Three Other Programs Supporting Parameterization of Biological Models
The abbreviations in the diagram stand for the following features: AP, algorithm-level parallelization: each algorithm step runs multiple objective function evaluations in parallel; BNGL, support for BNGL models; BS, bootstrapping; BUQ, Bayesian uncertainty quantification; CL, command-line interface; CN, native support for cluster computing. (Although any program can be run on a cluster with sufficient configuration by the user, PyBioNetFit was designed for this purpose. Its documentation includes instructions for how to run the program on multiple cluster nodes, and we have demonstrated this use case with up to 8 nodes, 288 cores.) FD, finite difference approximation; FS, forward sensitivity analysis; G, gradient-based algorithms; GUI, GUI for configuring and running fitting; M, metaheuristic algorithms; MA, multiple algorithm options available; MATLAB, MATLAB interface; NC, free with no commercial dependencies; ODE, support for ODE models; PL, profile likelihood; QD, fitting with qualitative data (including model checking and design applications); SBML, support for SBML models; SM, support for stochastic models.
Comparison to Problem-Specific Coding
PyBioNetFit joins Data2Dynamics and COPASI in the class of software supporting standardized biological model-definition formats and complete workflows for model parameterization and has strengths that are complementary to these existing tools. These free-standing applications contrast with the approach of using problem-specific code written in a high-level programming language such as Python, R, or MATLAB. We acknowledge that problem-specific code is a good choice in some use cases, such as when the model of interest is already implemented in one of these languages, or when analyzing a model with an unusual feature that is not supported in SBML or BNGL. Many packages are available that can help streamline model parameterization in high-level programming languages. For coding a model, one could use standard differential equation packages, or PySB (Lopez et al., 2013), a package for building biological models in Python. Gradient-based algorithms with forward sensitivity analysis are available in dMod (Kaschek et al., 2019). Other packages implement metaheuristic optimization algorithms (Egea et al., 2014, Garrett, 2012, Fortin et al., 2012) and Bayesian uncertainty quantification algorithms (Eydgahi et al., 2013, Gupta et al., 2018, Shockley et al., 2018). AMIGO (Balsa-Canto et al., 2016) is a notable MATLAB optimization toolbox. Packages such as dask.distributed (Rocklin, 2015) are available to help with parallelization on clusters.
Even with the sophistication of these tools, some amount of custom code is necessary to use these tools to solve a given problem of interest. We argue that in cases where writing BNGL or SBML models is feasible, the functionality of PyBioNetFit is preferable to problem-specific code. PyBioNetFit combines all the functionality required for model parameterization into a single package. It removes the need for debugging at the level of the programming language, which reduces the propensity for errors in the modeling work. PyBioNetFit allows a modeler to instead focus on designing models and choosing appropriate algorithms for parameterization and analysis. BPSL facilitates consideration of qualitative data, improving on our published approach using problem-specific code (Mitra et al., 2018).
A second advantage of using PyBioNetFit is in the reproducibility of results (Medley et al., 2016, Waltemath and Wolkenhauer, 2016). Although it is possible to create well-documented, reproducible problem-specific code, using IPython or R notebooks, for example, such good practices are not always followed. Often problem-specific code is developed to be run on a specific machine, without portability in mind. Concerns of expedience dominate the coding effort. In contrast, PyBioNetFit achieves a separation of concerns, in which a job can be documented by providing the set of input files used, along with the version number of the code, and there is no need to disentangle this information from the implementation of any algorithm.
Continued Development of PyBioNetFit
PyBioNetFit is released open source on GitHub (https://github.com/lanl/PyBNF) with the hope that we and others will continue to improve PyBioNetFit. The GitHub page includes an active issue tracker that facilitates reporting of bugs and feature requests.
We welcome contributions to PyBioNetFit from the community. We designed PyBioNetFit such that it should be straightforward to implement additional optimization and MCMC algorithms, as we are aware that many such algorithms are described in the literature. The PyBioNetFit documentation (Mitra and Suderman, 2019) includes instructions for contributing new algorithms to the PyBioNetFit code base.
Conclusion
PyBioNetFit offers a versatile set of tools, which we expect to be useful in parameterization of new biological models. PyBioNetFit is best in class for BNGL-formatted models, and notable for its support for stochastic biological models. PyBioNetFit supports several workflows, including fitting to time-series data, dose-response data, and qualitative data. We provide the first available implementation of our recent approach (Mitra et al., 2018) for leveraging both quantitative and qualitative data in a single parameterization problem. This approach is enabled by BPSL, which can also be used for model checking and design. The workflows supported in PyBioNetFit can be used for parameterizing standard ODE models, although for this application, gradient-based tools may be more efficient.
Our hope is that PyBioNetFit lowers the technical barrier to parameter fitting, by enabling fitting without problem-specific coding. PyBioNetFit will promote reproducible modeling by encouraging the use of existing model standards (BNGL and SBML).
We have shown that parameter identification can be challenging, and the best choice of fitting algorithm is not always obvious. By providing robust implementations of several algorithms, we encourage experimentation with different algorithms and settings to find the best choice for a problem of interest.
Limitations of the Study
PyBioNetFit can solve a wide variety of biological modeling problems, but is not the best solution for every problem. As described in the main text, many ODE models are more effectively parameterized using gradient-based algorithms. In addition, PyBioNetFit's metaheuristic algorithms can find fits that appear reasonable, but cannot guarantee that a global optimum has been reached. Parameterization using qualitative data, as implemented in BPSL, has the limitation that the objective function lacks a statistical interpretation, and so cannot be used in Bayesian uncertainty quantification algorithms.
Methods
All methods can be found in the accompanying Transparent Methods supplemental file.
Acknowledgments
This work was supported by grant R01GM111510 from the National Institute of General Medical Sciences (NIGMS) of the National Institutes of Health (NIH). W.S.H. acknowledges support from the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program established by the U.S. Department of Energy (DOE) and the National Cancer Institute (NCI) of NIH. R.S. and A.I. acknowledge support from the Center for Nonlinear Studies at Los Alamos National Laboratory (LANL), which is operated for the National Nuclear Security Administration (NNSA) of the DOE under contract 89233218CNA000001. H.M.S. acknowledges the support of grant R01GM123032 from NIGMS/NIH and grant P41EB023912 from the National Institute of Biomedical Imaging and Bioengineering (NIBIB) of NIH. We thank J. Kyle Medley and Kiri Choi for assistance with libRoadRunner development. We thank Adrian Hauber for assistance with Data2Dynamics. Computational resources used in this study included the following: the Darwin cluster at LANL, which is supported by the Computational Systems and Software Environment (CSSE) subprogram of the Advanced Simulation and Computing (ASC) program at LANL, which is funded by NNSA/DOE; resources were provided by the LANL Institutional Computing program, which is funded by NNSA/DOE, and Northern Arizona University’s Monsoon computer cluster, which is funded by Arizona’s Technology and Research Initiative Fund.
Author Contributions
W.S.H. and R.G.P. designed the study. E.D.M. and R.S. wrote the software. A.I. performed alpha testing. E.D.M. and J.C. performed benchmarking. A.H. and H.S. upgraded libRoadRunner to enable integration into PyBioNetFit. E.D.M. and W.S.H. wrote the manuscript with input from the other authors. All authors read and approved the final manuscript.
Declaration of Interests
The authors declare no competing interests.
Published: September 27, 2019
Footnotes
Supplemental Information can be found online at https://doi.org/10.1016/j.isci.2019.08.045.
Data and Code Availability
The most recent version of PyBioNetFit is v1.0.1, available online at https://github.com/lanl/PyBNF. The repository includes a user manual, Documentation_PyBioNetFit.pdf. The same user manual is available online as a standalone website (Mitra and Suderman, 2019). General information about PyBioNetFit is available at http://bionetfit.nau.edu/.
PyBioNetFit can be installed on any current Linux, macOS, or Windows computer, as well as on Linux clusters. Installation of Python 3 is required if it is not already included with the operating system. Root access is not usually required, allowing for PyBioNetFit to be readily installed on shared clusters. PyBioNetFit can be installed from source by downloading the code at the above GitHub link, or can be installed directly using the pip package manager with the command
python3 -m pip install pybnf
Data associated with the example fitting problems (Table 2) are provided as Data S1 and are also available online at https://github.com/RuleWorld/RuleHub/tree/2019Aug21/Published/Mitra2019. MCMC samples associated with Figure 4 are available in the BioStudies database (http://www.ebi.ac.uk/biostudies) under accession number S-BSST240.
Supplemental Information
This ZIP file contains 31 folders, corresponding to the 31 problems shown in Table 2. Each folder contains a README file describing the problem in more detail; all model, data, and configuration files needed to run the problem in PyBioNetFit; and example output for each algorithm tested on the problem.
References
- Balsa-Canto E., Henriques D., Gábor A., Banga J.R. AMIGO2, a toolbox for dynamic modeling, optimization and control in systems biology. Bioinformatics. 2016;32:3357–3359. doi: 10.1093/bioinformatics/btw411. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Betancourt M. A conceptual introduction to Hamiltonian Monte Carlo. arXiv. 2017 https://arxiv.org/abs/1701.02434. [Google Scholar]
- Blinov M.L., Faeder J.R., Goldstein B., Hlavacek W.S. BioNetGen: software for rule-based modeling of signal transduction based on the interactions of molecular domains. Bioinformatics. 2004;20:3289–3291. doi: 10.1093/bioinformatics/bth378. [DOI] [PubMed] [Google Scholar]
- Blinov M.L., Faeder J.R., Goldstein B., Hlavacek W.S. A network model of early events in epidermal growth factor receptor signaling that accounts for combinatorial complexity. BioSystems. 2006;83:136–151. doi: 10.1016/j.biosystems.2005.06.014. [DOI] [PubMed] [Google Scholar]
- Boehm M.E., Adlung L., Schilling M., Roth S., Klingmüller U., Lehmann W.D. Identification of isoform-specific dynamics in phosphorylation-dependent STAT5 dimerization by quantitative mass spectrometry and mathematical modeling. J. Proteome Res. 2014;13:5685–5694. doi: 10.1021/pr5006923. [DOI] [PubMed] [Google Scholar]
- Suderman R., Mitra E.D., Lin Y.T., Erickson K.E., Feng S., Hlavacek W.S. Generalizing Gillespie’s direct method to enable network-free simulations. Bull. Math. Biol. 2019;81:2822–2848. doi: 10.1007/s11538-018-0418-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brännmark C., Palmér R., Glad S.T., Cedersund G., Strålfors P. Mass and information feedbacks through receptor endocytosis govern insulin signaling as revealed using a parameter-free modeling framework. J. Biol. Chem. 2010;285:20171–20179. doi: 10.1074/jbc.M110.106849. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cao Y., Li S., Petzold L. Adjoint sensitivity analysis for differential-algebraic equations: algorithms and software. J. Comput. Appl. Math. 2002;149:171–191. [Google Scholar]
- Carpenter B., Gelman A., Hoffman M.D., Lee D., Goodrich B., Betancourt M., Brubaker M., Guo J., Li P., Riddell A. Stan: a probabilistic programming language. J. Stat. Softw. 2017;76:1–32. doi: 10.18637/jss.v076.i01. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen K.C., Csikasz-Nagy A., Gyorffy B., Val J., Novak B., Tyson J.J. Kinetic analysis of a molecular model of the budding yeast cell cycle. Mol. Biol. Cell. 2000;11:369–391. doi: 10.1091/mbc.11.1.369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen K.C., Calzone L., Csikasz-Nagy A., Cross F.R., Novak B., Tyson J.J. Integrative analysis of cell cycle control in budding yeast. Mol. Biol. Cell. 2004;15:3841–3862. doi: 10.1091/mbc.E03-11-0794. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chernick M.R., LaBudde R.A. John Wiley & Sons; 2011. An Introduction to Bootstrap Methods with Applications to R. [Google Scholar]
- Chib S., Greenberg E. Understanding the Metropolis-Hastings algorithm. Am. Stat. 1995;49:327–335. [Google Scholar]
- Choi K., Medley J.K., König M., Stocking K., Smith L., Gu S., Sauro H.M. Tellurium: an extensible python-based modeling environment for systems and synthetic biology. BioSystems. 2018;171:74–79. doi: 10.1016/j.biosystems.2018.07.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chylek L.A., Harris L.A., Tung C.-S., Faeder J.R., Lopez C.F., Hlavacek W.S. Rule-based modeling: a computational approach for studying biomolecular site dynamics in cell signaling systems. Wiley Interdiscip. Rev. Syst. Biol. Med. 2013;6:13–36. doi: 10.1002/wsbm.1245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chylek L.A., Akimov V., Dengjel J., Rigbolt K.T.G., Hu B., Hlavacek W.S., Blagoev B. Phosphorylation site dynamics of early T-cell receptor signaling. PLoS One. 2014;9:e104240. doi: 10.1371/journal.pone.0104240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clarke E.M., Emerson E.A., Sistla A.P. Automatic verification of finite state concurrent system using temporal logic specifications. ACM Lett. Program Lang. Syst. 1986;8:244–263. [Google Scholar]
- Clarke E.M., Grumberg O., Peled D., Belta P.C. MIT Press; Cambridge: 1999. Model Checking. [Google Scholar]
- Clarke E.M., Faeder J.R., Langmead C.J., Harris L.A., Jha S.K., Legay A. Statistical model checking in BioLab: applications to the automated analysis of T-cell receptor signaling pathway. In: Heiner M., Uhrmacher A.M., editors. Computational Methods in Systems Biology. Springer; 2008. pp. 231–250. [Google Scholar]
- Csikász-Nagy A., Battogtokh D., Chen K.C., Novák B., Tyson J.J. Analysis of a generic model of eukaryotic cell-cycle regulation. Biophys. J. 2006;90:4361–4379. doi: 10.1529/biophysj.106.081240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Danos V., Laneve C. Formal molecular biology. Theor. Comput. Sci. 2004;325:69–110. [Google Scholar]
- David A., Larsen K.G., Legay A., Mikučionis M., Poulsen D.B., Sedwards S. Runtime verification of biological systems. In: Margaria T., Steffen B., editors. Leveraging Applications of Formal Methods, Verification and Validation. Technologies for Mastering Change. Springer; 2012. pp. 388–404. [Google Scholar]
- Dunster J.L., Byrne H.M., King J.R. The resolution of inflammation: a mathematical model of neutrophil and macrophage interactions. Bull. Math. Biol. 2014;76:1953–1980. doi: 10.1007/s11538-014-9987-x. [DOI] [PubMed] [Google Scholar]
- Earl D.J., Deem M.W. Parallel tempering: theory, applications, and new perspectives. Phys. Chem. Chem. Phys. 2005;7:3910. doi: 10.1039/b509983h. [DOI] [PubMed] [Google Scholar]
- Eberhart, R. and Kennedy, J.. (1995), A new optimizer using particle swarm theory, in MHS’95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science, IEEE, pp. 39–43.
- Efron B., Tibshirani R.J. Chapman and Hall; 1993. An Introduction to the Bootstrap. [Google Scholar]
- Egea J.A., Henriques D., Cokelaer T., Villaverde A.F., MacNamara A., Danciu D.-P., Banga J.R., Saez-Rodriguez J. MEIGO: an open-source software suite based on metaheuristics for global optimization in systems biology and bioinformatics. BMC Bioinformatics. 2014;15:136. doi: 10.1186/1471-2105-15-136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Erickson K.E., Rukhlenko O.S., Shahinuzzaman M., Slavkova K.P., Lin Y.T., Suderman R., Stites E.C., Anghel M., Posner R.G., Barua D. Modeling cell line-specific recruitment of signaling proteins to the insulin-like growth factor 1 receptor. PLoS Comput. Biol. 2019;15:e1006706. doi: 10.1371/journal.pcbi.1006706. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eydgahi H., Chen W.W., Muhlich J.L., Vitkup D., Tsitsiklis J.N., Sorger P.K. Properties of cell death models calibrated and compared using Bayesian approaches. Mol. Syst. Biol. 2013;9:644. doi: 10.1038/msb.2012.69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Faeder J.R., Hlavacek W.S., Reischl I., Blinov M.L., Metzger H., Redondo A., Wofsy C., Goldstein B. Investigation of early events in FcϵRI-mediated signaling using a detailed mathematical model. J. Immunol. 2003;170:3769–3781. doi: 10.4049/jimmunol.170.7.3769. [DOI] [PubMed] [Google Scholar]
- Faeder J.R., Blinov M.L., Goldstein B., Hlavacek W.S. Rule-based modeling of biochemical networks. Complexity. 2005;10:22–41. [Google Scholar]
- Faeder J.R., Blinov M.L., Hlavacek W.S. Rule-based modeling of biochemical systems with BioNetGen. Methods Mol. Biol. 2009;500:113–167. doi: 10.1007/978-1-59745-525-1_5. [DOI] [PubMed] [Google Scholar]
- Fey D., Halasz M., Dreidax D., Kennedy S.P., Hastings J.F., Rauch N., Munoz A.G., Pilkington R., Fischer M., Westermann F. Signaling pathway models as biomarkers: patient-specific simulations of JNK activity predict the survival of neuroblastoma patients. Sci. Signal. 2015;8:1–16. doi: 10.1126/scisignal.aab0990. [DOI] [PubMed] [Google Scholar]
- Fitzgerald J.B., Schoeberl B., Nielsen U.B., Sorger P.K. Systems biology and combination therapy in the quest for clinical efficacy. Nat. Chem. Biol. 2006;2:458–466. doi: 10.1038/nchembio817. [DOI] [PubMed] [Google Scholar]
- Fortin F.-A., De Rainville F.-M., Gardner M.-A., Parizeau M., Gagné C. DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 2012;13:2171–2175. [Google Scholar]
- Fröhlich F., Theis F.J., Hasenauer J. Uncertainty analysis for non-identifiable dynamical systems: profile likelihoods, bootstrapping and more. In: Mendes P., Dada J.O., Smallbone K., editors. Computational Methods in Systems Biology. Springer International Publishing; 2014. pp. 61–72. [Google Scholar]
- Fröhlich F., Kaltenbacher B., Theis F.J., Hasenauer J. Scalable parameter estimation for genome-scale biochemical reaction networks. PLoS Comput. Biol. 2017;13:e1005331. doi: 10.1371/journal.pcbi.1005331. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fröhlich F., Kessler T., Weindl D., Shadrin A., Schmiester L., Hache H., Muradyan A., Schütte M., Lim J.-H., Heinig M. Efficient parameter estimation enables the prediction of drug response using a mechanistic pan-cancer pathway model. Cell Syst. 2018;7:567–579.e6. doi: 10.1016/j.cels.2018.10.013. [DOI] [PubMed] [Google Scholar]
- Gandomi A.H., Yang X.-S., Talatahari S., Alavi A.H. Metaheuristic algorithms in modeling and optimization. In: Gandomi A.H., Yang X.-S., Talatahari S., Alavi A.H., editors. Metaheuristic Applications in Structures and Infrastructures. Elsevier; 2013. pp. 1–24. [Google Scholar]
- Garrett A. Inspyred: a framework for creating bio-inspired computational intelligence algorithms in Python. 2012. https://github.com/aarongarrett/inspyred
- Gelman A., Rubin D.B. Inference from iterative simulation using multiple sequences. Stat. Sci. 1992;7:457–511. [Google Scholar]
- Gillespie D.T. Stochastic simulation of chemical kinetics. Annu. Rev. Phys. Chem. 2006;58:35–55. doi: 10.1146/annurev.physchem.58.032806.104637. [DOI] [PubMed] [Google Scholar]
- Glover F., Laguna M., Martí R. Fundamentals of scatter search and path relinking. Control Cybernetics. 2000;29:652–684. [Google Scholar]
- Gupta A., Mendes P. An overview of network-based and -free approaches for stochastic simulation of biochemical systems. Computation. 2018;6:9. doi: 10.3390/computation6010009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gupta, S., Hainsworth, L., Hogg, J.S., Lee, R.E.C. and Faeder, J.R.. (2018), Evaluation of parallel tempering to accelerate Bayesian parameter estimation in systems biology, in 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), pp. 690–697. [DOI] [PMC free article] [PubMed]
- Harmon B., Chylek L.A., Liu Y., Mitra E.D., Mahajan A., Saada E.A., Schudel B.R., Holowka D.A., Baird B.A., Wilson B.S. Timescale separation of positive and negative signaling creates history-dependent responses to IgE receptor stimulation. Sci. Rep. 2017;7:15586. doi: 10.1038/s41598-017-15568-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harris L.A., Hogg J.S., Tapia J.-J., Sekar J.A.P., Gupta S., Korsunsky I., Arora A., Barua D., Sheehan R.P., Faeder J.R. BioNetGen 2.2: advances in rule-based modeling. Bioinformatics. 2016;32:3366–3368. doi: 10.1093/bioinformatics/btw469. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hass H., Loos C., Alvarez E.R., Timmer J., Hasenauer J., Kreutz C. Benchmark problems for dynamic modeling of intracellular processes. Bioinformatics. 2019 doi: 10.1093/bioinformatics/btz020. Epub ahead of print. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heath J., Kwiatkowska M., Norman G., Parker D., Tymchyshyn O. Probabilistic model checking of complex biological pathways. Theor. Comput. Sci. 2008;391:239–257. [Google Scholar]
- Hindmarsh A.C., Brown P.N., Grant K.E., Lee S.L., Serban R., Shumaker D.E., Woodward C.S. SUNDIALS: suite of nonlinear and differential/algebraic equation solvers. ACM Trans. Math. Softw. 2005;31:363–396. [Google Scholar]
- Hlavacek W.S., Csicsery-Ronay J., Baker L.R., Ramos Álamo M.D.C., Ionkov A., Mitra E.D., Suderman R., Erickson K.E., Dias R., Colvin J. A step-by-step guide to using BioNetFit. In: Hlavacek W.S., editor. Modeling Biomolecular Site Dynamics. Vol. 1945. Humana Press; 2018. pp. 391–419. (Methods in Molecular Biology). [DOI] [PubMed] [Google Scholar]
- Hoops S., Gauges R., Lee C., Pahle J., Simus N., Singhal M., Xu L., Mendes P., Kummer U. Copasi - a complex pathway simulator. Bioinformatics. 2006;22:3067–3074. doi: 10.1093/bioinformatics/btl485. [DOI] [PubMed] [Google Scholar]
- Hucka M., Finney A., Sauro H.M., Bolouri H., Doyle J.C., Kitano H., Arkin A.P., Bornstein B.J., Bray D., Cornish-Bowden A. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics. 2003;19:524–531. doi: 10.1093/bioinformatics/btg015. [DOI] [PubMed] [Google Scholar]
- Hussain F., Langmead C.J., Mi Q., Dutta-Moscato J., Vodovotz Y., Jha S.K. Automated parameter estimation for biological models using Bayesian statistical model checking. BMC Bioinformatics. 2015;16:S8. doi: 10.1186/1471-2105-16-S17-S8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kapfer E.-M., Stapor P., Hasenauer J. Challenges in the calibration of large-scale ordinary differential equation models. bioRxiv. 2019:690222. https://www.biorxiv.org/content/10.1101/690222v1. [Google Scholar]
- Khalid, A. and Jha, S.K.. (2018), Calibration of rule-based stochastic biochemical models using statistical model checking, in 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE, pp. 179–184.
- Kaschek D., Mader W., Fehling-Kaschek M., Rosenblatt M., Timmer J. Dynamic modeling, parameter estimation, and uncertainty analysis in R. J. Stat. Softw. 2019;88 [Google Scholar]
- Kiselyov V.V., Versteyhe S., Gauguin L., De Meyts P. Harmonic oscillator model of the insulin and IGF1 receptors’ allosteric binding and activation. Mol. Syst. Biol. 2009;5:243. doi: 10.1038/msb.2008.78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kocieniewski P., Faeder J.R., Lipniacki T. The interplay of double phosphorylation and scaffolding in MAPK pathways. J. Theor. Biol. 2012;295:116–124. doi: 10.1016/j.jtbi.2011.11.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kozer N., Barua D., Orchard S., Nice E.C., Burgess A.W., Hlavacek W.S., Clayton A.H.A. Exploring higher-order EGFR oligomerisation and phosphorylation–a combined experimental and theoretical approach. Mol. Biosyst. 2013;9:1849–1863. doi: 10.1039/c3mb70073a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kraikivski P., Chen K.C., Laomettachit T., Murali T.M., Tyson J.J. From START to FINISH: computational analysis of cell cycle control in budding yeast. NPJ Syst. Biol. Appl. 2015;1:15016. doi: 10.1038/npjsba.2015.16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kühn C., Hillmann K. Rule-based modeling of labor market dynamics: an introduction. J. Econ. Interact. Coord. 2016;11:57–76. [Google Scholar]
- Kwiatkowska M., Norman G., Parker D. Using probabilistic model checking in systems biology. ACM SIGMETRICS Perform. Eval. Rev. 2008;35:14. [Google Scholar]
- Laomettachit T. Virginia Polytechnic Institute and State University; 2011. Mathematical Modeling Approaches for Dynamical Analysis of Protein Regulatory Networks with Applications to the Budding Yeast Cell Cycle and the Circadian Rhythm in Cyanobacteria. PhD thesis. [Google Scholar]
- Laomettachit T., Chen K.C., Baumann W.T., Tyson J.J. A model of yeast cell-cycle regulation based on a standard component modeling strategy for protein regulatory networks. PLoS One. 2016;11:e0153738. doi: 10.1371/journal.pone.0153738. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee D., Wiswall M. A parallel implementation of the simplex function minimization routine. Computat. Econ. 2007;30:171–187. [Google Scholar]
- Lee E., Salic A., Krüger R., Heinrich R., Kirschner M.W. The roles of APC and axin derived from experimental and theoretical analysis of the Wnt pathway. PLoS Biol. 2003;1:116–132. doi: 10.1371/journal.pbio.0000010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leeuw T., Wu C., Schrag J.D., Whiteway M., Thomas D.Y., Leberer E. Interaction of a G-protein β-subunit with a conserved sequence in Ste20/PAK family protein kinases. Nature. 1998;391:191–195. doi: 10.1038/34448. [DOI] [PubMed] [Google Scholar]
- Leis J.R., Kramer M.A. The simultaneous solution and sensitivity analysis of systems described by ordinary differential equations. ACM Trans. Math. Softw. 1988;14:45–60. [Google Scholar]
- Liu, B. and Faeder, J.R.. (2016), Parameter estimation of rule-based models using statistical model checking, in 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE, pp. 1453–1459.
- Lopez C.F., Muhlich J.L., Bachman J.A., Sorger P.K. Programming biological models in Python using PySB. Mol. Syst. Biol. 2013;9:646. doi: 10.1038/msb.2013.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Manz B.N., Jackson B.L., Petit R.S., Dustin M.L., Groves J. T-cell triggering thresholds are modulated by the number of antigen within individual T-cell receptor clusters. Proc. Natl. Acad. Sci. U S A. 2011;108:9089–9094. doi: 10.1073/pnas.1018771108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- MathWorks Least-squares (model fitting) algorithms. 2018. https://www.mathworks.com/help/optim/ug/least-squares-model-fitting-algorithms.html
- Medley J.K., Goldberg A.P., Karr J.R. Guidelines for reproducibly building and simulating systems biology models. IEEE Trans. Biomed. Eng. 2016;63:2015–2020. doi: 10.1109/TBME.2016.2591960. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Medley J.K., Choi K., König M., Smith L., Gu S., Hellerstein J., Sealfon S.C., Sauro H.M. Tellurium notebooks - an environment for reproducibile dynamical modeling in systems biology. PLoS Comput. Biol. 2018;14:e1006220. doi: 10.1371/journal.pcbi.1006220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mitra E., Suderman R. PyBioNetFit. 2019. https://pybnf.readthedocs.io/en/latest/
- Mitra E.D., Dias R., Posner R.G., Hlavacek W.S. Using both qualitative and quantitative data in parameter identification for systems biology models. Nat. Commun. 2018;9:3901. doi: 10.1038/s41467-018-06439-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Monine M.I., Posner R.G., Savage P.B., Faeder J.R., Hlavacek W.S. Modeling multivalent ligand-receptor interactions with steric constraints on configurations of cell-surface receptor aggregates. Biophys. J. 2010;98:48–56. doi: 10.1016/j.bpj.2009.09.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moraes A.O.S., Mitre J.F., Lage P.L.C., Secchi A.R. A robust parallel algorithm of the particle swarm optimization method for large dimensional engineering problems. Appl. Math. Model. 2015;39:4223–4241. [Google Scholar]
- Mukhopadhyay H., Cordoba S.-P., Maini P.K., van der Merwe P.A., Dushek O. Systems model of T cell receptor proximal signaling reveals emergent ultrasensitivity. PLoS Comput. Biol. 2013;9:e1003004. doi: 10.1371/journal.pcbi.1003004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nelder J.A., Mead R. A simplex method for function minimization, Computer J. 1965;7:308–313. [Google Scholar]
- Neri F., Cotta C., Moscato P. Vol. 379. Springer; 2012. (Handbook of Memetic Algorithms). [Google Scholar]
- Oguz C., Laomettachit T., Chen K.C., Watson L.T., Baumann W.T., Tyson J.J. Optimization and model reduction in the high dimensional parameter space of a budding yeast cell cycle model. BMC Syst. Biol. 2013;7:53. doi: 10.1186/1752-0509-7-53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pargett M., Umulis D.M. Quantitative model analysis with diverse biological data: applications in developmental pattern formation. Methods. 2013;62:56–67. doi: 10.1016/j.ymeth.2013.03.024. [DOI] [PubMed] [Google Scholar]
- Pargett M., Rundell A.E., Buzzard G.T., Umulis D.M. Model-based analysis for qualitative data: an application in Drosophila germline stem cell regulation. PLoS Comput. Biol. 2014;10:e1003498. doi: 10.1371/journal.pcbi.1003498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Penas D.R., Banga J.R., González P., Doallo R. Enhanced parallel differential evolution algorithm for problems in computational systems biology. Appl. Soft Comput. 2015;33:86–99. [Google Scholar]
- Penas D.R., González P., Egea J.A., Doallo R., Banga J.R. Parameter estimation in large-scale systems biology models: a parallel and self-adaptive cooperative strategy. BMC Bioinformatics. 2017;18:52. doi: 10.1186/s12859-016-1452-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Posner R.G., Geng D., Haymore S., Bogert J., Pecht I., Licht A., Savage P.B. Trivalent antigens for degranulation of mast cells. Organ. Lett. 2007;9:3551–3554. doi: 10.1021/ol071175h. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Press W.H., Teukolsky S.A., Vetterling W.T., Flannery B.P. Cambridge University Press; 2007. Numerical Recipes 3rd Edition: The Art of Scientific Computing. [Google Scholar]
- Rackauckas C., Ma Y., Dixit V., Guo X., Innes M., Revels J., Nyberg J., Ivaturi V. A comparison of automatic differentiation and continuous sensitivity analysis for derivatives of differential equation solutions. arXiv. 2018 https://arxiv.org/abs/1812.01892. [Google Scholar]
- Raue A., Schilling M., Bachmann J., Matteson A., Schelke M., Kaschek D., Hug S., Kreutz C., Harms B.D., Theis F.J. Lessons learned from quantitative dynamical modeling in systems biology. PLoS One. 2013;8:e74335. doi: 10.1371/journal.pone.0074335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Raue A., Steiert B., Schelker M., Kreutz C., Maiwald T., Hass H., Vanlier J., Tönsing C., Adlung L., Engesser R. Data2Dynamics: a modeling environment tailored to parameter estimation in dynamical systems, Bioinformatics. 2015;31:3558–3560. doi: 10.1093/bioinformatics/btv405. [DOI] [PubMed] [Google Scholar]
- Rocklin, M.. (2015), Dask: Parallel computation with blocked algorithms and task scheduling, in Proceedings of the 14th Python in Science Conference, pp. 130–136.
- Romano D., Nguyen L.K., Matallanas D., Halasz M., Doherty C., Kholodenko B.N., Kolch W. Protein interaction switches coordinate Raf-1 and MST2/Hippo signalling. Nat. Cell Biol. 2014;16:673–684. doi: 10.1038/ncb2986. [DOI] [PubMed] [Google Scholar]
- Shirin A., Klickstein I.S., Feng S., Lin Y.T., Hlavacek W.S., Sorrentino F. Prediction of optimal drug schedules for controlling autophagy. Sci. Rep. 2019;9:1428. doi: 10.1038/s41598-019-38763-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shockley E.M., Vrugt J.A., Lopez C.F. PyDREAM: high-dimensional parameter inference for biological models in python. Bioinformatics. 2018;34:695–697. doi: 10.1093/bioinformatics/btx626. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith A.E., Coit D.W. Penalty functions. In: Baeck T., Fogel D., Michalewicz Z., editors. Handbook of Evolutionary Computation. Oxford University Press; 1997. pp. C5.2:1–C5.2:6. chapter C5.2. [Google Scholar]
- Sneddon M.W., Faeder J.R., Emonet T. Efficient modeling, simulation and coarse-graining of biological complexity with NFsim. Nat. Methods. 2011;8:177–183. doi: 10.1038/nmeth.1546. [DOI] [PubMed] [Google Scholar]
- Somogyi E.T., Bouteiller J.-M., Glazier J.A., König M., Medley J.K., Swat M.H., Sauro H.M. LibRoadRunner: a high performance SBML simulation and analysis library. Bioinformatics. 2015;31:3315–3321. doi: 10.1093/bioinformatics/btv363. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sorokina O., Sorokin A., Armstrong J.D., Danos V. A simulator for spatially extended kappa models. Bioinformatics. 2013;29:3105–3106. doi: 10.1093/bioinformatics/btt523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Spellman P.T., Sherlock G., Zhang M.Q., Iyer V.R., Anders K., Eisen M.B., Brown P.O., Botstein D., Futcher B. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell. 1998;9:3273–3297. doi: 10.1091/mbc.9.12.3273. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stapor P., Weindl D., Ballnus B., Hug S., Loos C., Fiedler A., Krause S., Hroß S., Fröhlich F., Hasenauer J. PESTO: parameter EStimation TOolbox. Bioinformatics. 2018;34:705–707. doi: 10.1093/bioinformatics/btx676. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Storn R., Price K. Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 1997;11:341–359. [Google Scholar]
- Suderman R., Deeds E.J. Machines vs. ensembles: effective MAPK signaling through heterogeneous sets of protein complexes. PLoS Comput. Biol. 2013;9:e1003278. doi: 10.1371/journal.pcbi.1003278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Suderman, R. and Hlavacek, W.S.. (2017), TRuML: A translator for rule-based modeling languages, in Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, Vol. 1, ACM Press, New York, New York, USA, pp. 372–377.
- ter Braak C.J.F., Vrugt J.A. Differential Evolution Markov chain with snooker updater and fewer chains. Stat. Comput. 2008;18:435–446. [Google Scholar]
- Thomas B.R., Chylek L.A., Colvin J., Sirimulla S., Clayton A.H., Hlavacek W.S., Posner R.G. BioNetFit: a fitting tool compatible with BioNetGen, NFsim and distributed computing environments. Bioinformatics. 2016;32:798–800. doi: 10.1093/bioinformatics/btv655. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Villaverde A.F., Frölich F., Weindl D., Hasenauer J., Banga J.R. Benchmarking optimization methods for parameter estimation in large kinetic models. Bioinformatics. 2019;35:830–838. doi: 10.1093/bioinformatics/bty736. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Waltemath D., Wolkenhauer O. How modeling standards, software, and initiatives support reproducibility in systems biology and systems medicine. IEEE Trans. Biomed. Eng. 2016;63:1999–2006. doi: 10.1109/TBME.2016.2555481. [DOI] [PubMed] [Google Scholar]
- Webb S.D., Sherratt J.A., Fish R.G. Cells behaving badly: a theoretical model for the Fas/FasL system in tumour immunology. Math. Biosci. 2011;179:113–129. doi: 10.1016/s0025-5564(02)00120-7. [DOI] [PubMed] [Google Scholar]
- Xu W., Smith A.M., Faeder J.R., Marai G.E. RuleBender: a visual interface for rule-based modeling. Bioinformatics. 2011;27:1721–1722. doi: 10.1093/bioinformatics/btr197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xue M., Del Bigio M.R. Intracerebral injection of autologous whole blood in rats: time course of inflammation and cell death. Neurosci. Lett. 2000;283:230–232. doi: 10.1016/s0304-3940(00)00971-x. [DOI] [PubMed] [Google Scholar]
- Yi T.-M., Kitano H., Simon M.I. A quantitative characterization of the yeast heterotrimeric G protein cycle. Proc. Natl. Acad. Sci. U S A. 2003;100:10764–10769. doi: 10.1073/pnas.1834247100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yu R.C., Pesce C.G., Colman-Lerner A., Lok L., Pincus D., Serra E., Holl M., Benjamin K., Gordon A., Brent R. Negative feedback that improves information transmission in yeast signalling. Nature. 2008;456:755–761. doi: 10.1038/nature07513. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zheng Y., Sweet S.M.M., Popovic R., Martinez-Garcia E., Tipton J.D., Thomas P.M., Licht J.D., Kelleher N.L. Total kinetic analysis reveals how combinatorial methylation patterns are established on lysines 27 and 36 of histone H3. Proc. Natl. Acad. Sci. U S A. 2012;109:13549–13554. doi: 10.1073/pnas.1205707109. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
This ZIP file contains 31 folders, corresponding to the 31 problems shown in Table 2. Each folder contains a README file describing the problem in more detail; all model, data, and configuration files needed to run the problem in PyBioNetFit; and example output for each algorithm tested on the problem.
Data Availability Statement
The most recent version of PyBioNetFit is v1.0.1, available online at https://github.com/lanl/PyBNF. The repository includes a user manual, Documentation_PyBioNetFit.pdf. The same user manual is available online as a standalone website (Mitra and Suderman, 2019). General information about PyBioNetFit is available at http://bionetfit.nau.edu/.
PyBioNetFit can be installed on any current Linux, macOS, or Windows computer, as well as on Linux clusters. Installation of Python 3 is required if it is not already included with the operating system. Root access is not usually required, allowing for PyBioNetFit to be readily installed on shared clusters. PyBioNetFit can be installed from source by downloading the code at the above GitHub link, or can be installed directly using the pip package manager with the command
python3 -m pip install pybnf
Data associated with the example fitting problems (Table 2) are provided as Data S1 and are also available online at https://github.com/RuleWorld/RuleHub/tree/2019Aug21/Published/Mitra2019. MCMC samples associated with Figure 4 are available in the BioStudies database (http://www.ebi.ac.uk/biostudies) under accession number S-BSST240.







