NLoed: A Python Package for Nonlinear Optimal Experimental Design in Systems Biology

Nathan Braniff; Taylor Pearce; Zixuan Lu; Michael Astwood; William S R Forrest; Cody Receno; Brian Ingalls

doi:10.1021/acssynbio.2c00131

. 2022 Dec 6;11(12):3921–3928. doi: 10.1021/acssynbio.2c00131

NLoed: A Python Package for Nonlinear Optimal Experimental Design in Systems Biology

Nathan Braniff ¹, Taylor Pearce ¹, Zixuan Lu ¹, Michael Astwood ¹, William S R Forrest ¹, Cody Receno ¹, Brian Ingalls ^1,^*

PMCID: PMC9765746 PMID: 36473701

Abstract

graphic file with name sb2c00131_0009.jpg

Modeling in systems and synthetic biology relies on accurate parameter estimates and predictions. Accurate model calibration relies, in turn, on data and on how well suited the available data are to a particular modeling task. Optimal experimental design (OED) techniques can be used to identify experiments and data collection procedures that will most efficiently contribute to a given modeling objective. However, implementation of OED is limited by currently available software tools that are not well suited for the diversity of nonlinear models and non-normal data commonly encountered in biological research. Moreover, existing OED tools do not make use of the state-of-the-art numerical tools, resulting in inefficient computation. Here, we present the NLoed software package and demonstrate its use with in vivo data from an optogenetic system in Escherichia coli. NLoed is an open-source Python library providing convenient access to OED methods, with particular emphasis on experimental design for systems biology research. NLoed supports a wide variety of nonlinear, multi-input/output, and dynamic models and facilitates modeling and design of experiments over a wide variety of data types. To support OED investigations, the NLoed package implements maximum likelihood fitting and diagnostic tools, providing a comprehensive modeling workflow. NLoed offers an accessible, modular, and flexible OED tool set suited to the wide variety of experimental scenarios encountered in systems biology research. We demonstrate NLoed’s capabilities by applying it to experimental design for characterization of a bacterial optogenetic system.

1. Introduction

Biological systems are heterogeneous in terms of both their components and the interactions between them. Mathematical modeling of biological systems provides researchers with a valuable tool set for investigating this complexity. Models can be used to generate new hypotheses about extant systems or to predict properties of novel synthetic biological designs. These models are typically nonlinear, multi-dimensional, and dynamic and depend on parameters that cannot be directly measured. Accurate estimation of the values of these parameters is critical: the accuracy of the parameterization determines the utility of the model predictions and the model’s overall value as a tool for investigating system behavior.¹ Accurate parameterization of complex models often faces challenges due to the high cost of data collection. This challenge can be exacerbated by uncertainty in how to design experiments to maximize calibration accuracy.

Recent progress in experimental techniques, specifically with respect to automation and high-throughput methods (recently reviewed²), reduces the barriers imposed by experimental costs. However, without complementary advances in experimental design, cheaper data collection may lead to increases in the quantity of data but not in the quality, and so may have a limited impact on model calibration accuracy. Experimental design tools are especially important for nonlinear models, for which intuition can be a poor guide. The value of optimal experimental design in improving parameter estimates for nonlinear and dynamic models in system biology has been previously demonstrated.^1,3

Optimal experimental design (OED) techniques were originally developed in statistics for fitting regression models accurately with minimal experimental effort.⁴ These techniques have been expanded to nonlinear and dynamic models and have seen increasing use in the past decades, including applications to biological systems.^2,5,6 However, most of these methods rely on custom implementations of their specific numerical algorithms. The lack of established software tools has no doubt limited the use of OED in experimental studies, confining the use of OED techniques to research groups with specialized knowledge of OED methods.^7,8

Existing optimal experimental design tools include statistical software suites such as SAS and R, which provide a number of optimal design packages.^9,10 These packages are primarily aimed at regression-type models used in statistics and are of limited use to systems biologists.

Several software packages for OED have emerged from the pharmacokinetics–pharmacodynamics (PKPD) research community; examples include the PopED package¹¹ (available in MATLAB) and PFIM 3.0¹² (available in R). For a full comparison of OED tools in the PKPD field, see the review.¹³ These PKPD tools focus on nonlinear and dynamic models. However, PKPD methods emphasize mixed-effects models (with test-subject-specific sources of variability), which can be computationally demanding and are less relevant for those working outside the PKPD field.

Software development for Bayesian optimal design has been rare, possibly due to the heavy computational cost. A very early example is an XLISP-STAT package;¹⁴ a more recent example is the aceBayes package in R,¹⁵ which has seen some application to dynamic biological models.¹⁶ Bayesian techniques offer great flexibility in modeling uncertainty and avoid the parameter dependence of local optimal design methods. However, this flexibility comes at additional computational costs resulting from the need for extensive Monte Carlo sampling or other forms of numerical integration.

A resurgence of interest in experimental design applied to biological systems has come from the synthetic biology community.¹⁷ This includes the use of traditional design of experiments techniques based on statistical models to improve the efficiency of biological designs^17,18 as well as optimal design.¹⁹ Of note, the AMIGO2 software toolbox provides a wide variety of simulation and optimization tools relevant to systems and synthetic biologists. Originally published over a decade ago,²⁰ the newest version²¹ includes OED tools within MATLAB. Other areas of optimal design to receive attention in the synthetic biology community include optimal model selection²² as well as the forthcoming BOMBs package implementing Bayesian OED for biological systems.²³

Wider adoption of OED by experimental systems biologists can be facilitated by software tools that are focused on the specific needs of these researchers. Specifically lacking are optimal design tools—and model building tools in general—for data that are not Gaussian-distributed. Systems biology is replete with experimental assays that have non-Gaussian distributions, including plate counts (Poisson), gene expression (log–normal), and viability assays (Bernoulli or binomial). The majority of published OED approaches rely on Gaussian approximations which are often only appropriate under restrictive assumptions.

Also lacking are software tools that enable the iterative experimentation that is needed for modeling nonlinear biological systems—especially in a dynamic context. Nonlinear systems can exhibit dramatic differences in behavior over parameter and input ranges. A priori, experimenters may have a poor understanding of which regime is relevant for their study; iterative experimentation is generally required to identify the operating regime and focus on the dominant effects. While some amount of iteration could ideally be reduced by using Bayesian design tools such as aceBayes, there remain logistical reasons why iteration is needed in experimental workflows—the added computational cost of Bayesian designs can also make rapid iteration difficult. NLoed differentiates itself from the existing tools by focusing on providing an easy-to-use and efficient sequential design workflow for nonlinear and non-Gaussian systems.

2. Methods

In this work, we present the NLoed software package, a purpose-built and user-friendly OED software tool set developed to make OED more accessible to systems biologists. The NLoed package has been released under an open-source licence; it is hosted on Github (https://github.com/ingallslab/NLoed). The package is written in Python 3 and can be used either through a Python interpreter or in Python scripts. The NLoed package is object-oriented; it provides classes for model building and experimental design. The package classes and their associated methods can be flexibly interfaced to facilitate a variety of workflows for both real experiments and simulation studies. NLoed uses Pandas and Numpy data structures, allowing easy data exchange with other numerical packages.

In developing NLoed, we aimed to implement well-established local optimal design methods that are simple and practical, while making use of state-of-the-art numerical methods. The OED methods in NLoed rely heavily on the Fisher information matrix for design optimization. Calculation of the Fisher information relies, in turn, on accurate and rapid local parametric sensitivity computation. Moreover, design optimization requires a nonlinear programming solver. Although it is possible to estimate sensitivity and objective derivatives as finite differences, these methods can be computationally costly and inaccurate.²⁴ As an alternative, NLoed relies on CasADi,²⁵ a rapid prototyping package for optimal control that provides automatic differentiation functionality and a direct interface to the nonlinear programming solver IPOPT.²⁶ CasADi allows for rapid and accurate computation of both the Fisher information and design objective derivatives. CasADi also provides a symbolic interface for model construction, facilitating the efficient formulation of complex models.

All models in the package take the following mathematical form

Here, the vector x captures the model inputs, while the vector θ consists of the model parameters. The index i is used to distinguish separate observations. For each value of i, the function f_i(·) maps model inputs and parameter values to the observation’s sampling statistics, collected in vector η_i. The sampling statistics characterize the distribution that describes a given observation. For example, for a normally distributed observation variable, the sampling statistics are the mean and variance (η = (μ, σ²) = (mean, variance)), which completely characterize the distribution from which the observation is drawn. (Recall that in statistical modeling, observations are treated as realizations of a random variable that is established by the model; the function f(·) describes the random variables in terms of the model inputs and parameters.) In the NLoed package, the function f(·) is implemented using CasADi’s symbolics, which can consist of analytic expressions or numerical algorithms. The number of observations and inputs an NLoed model can accept is constrained only by the computational cost of computing optimal designs. NLoed can be used to optimize designs for multi-input/output models, including multi-state dynamic systems.

Equation 2 specifies each observation, Y_i, as drawn from the probability distribution specified by the sampling statistics η_i. The notation p_i(y_i|η_i) emphasizes the conditioning on the sampling statistics. Using this model formulation, NLoed accommodates experimental observations that are non-normally distributed, including counts and strictly positive data. The user can specify a specific distribution type for each observation variable according to the experimental scenario. Supported distributions include Normal, Poisson, Binomial, Bernoulli, Lognormal, Exponential, and Gamma. NLoed is thus applicable to a wide variety of experimental scenarios (e.g., involving both gene expression and plate counts) and modeling frameworks (e.g., deterministic approximations of stochastic models via moment closure²⁷ or the linear noise approximation²⁸). Note that an observation Y_i corresponds to a measurement from a specific observation channel (e.g., concentration, light intensity, optical density, etc.) at a specific, pre-specified time point. (Therefore, for example, a measurement repeated at two different time points corresponds to two different observations Y_i.)

For a given system (1-2), an experimental design in NLoed is defined by a pair of sets Inline graphic characterizing inputs and observation replicate counts, as follows: here, X is the set of input vectors, x_j, that describe the set of experiments to be executed, that is, the set of experimental conditions to be observed. The input vector can encode the numerical settings of treatments such as chemical concentrations or light intensity or environmental conditions such as temperature or media composition. Categorical perturbations, such as strain or treatment type, can be encoded as binary entries. Time-varying inputs are implemented as piece-wise constants by first subdividing the experimental time window and then specifying a constant value over each sub-interval.

The system specification (1-2) defines the collection of possible observations Y_i. The set Ξ contains the allocation of replicates: observable Y_i is to be assessed ξ_i,j times under the jth input setting. These are (non-negative) integer-valued counts. However, to improve numerical tractability, NLoed relaxes this integer constraint so that ξ_i,j corresponds to non-negative real-valued weights. These real-value allocations are then rounded to discrete values after an optimal solution is found. We refer to a design with real-valued replicate allocations as relaxed, while a design with integer-valued allocations is called exact. It should be noted that given the use of the relaxed integer constraint and rounding, NLoed’s exact designs are not strictly optimal in a mathematical sense. While NLoed’s exact designs have been improved via optimization, it is possible that better exact designs may exist.

Design optimization problems in NLoed have the following general form

Here, Inline graphic is the expected Fisher information matrix for experimental design , evaluated at the nominal parameter vector . The Fisher information appears in the objective because it is asymptotically related to the expected variability of estimates of the model parameters determined by maximum likelihood estimation.²⁹ The Fisher information therefore serves as a useful metric for design utility. The objective function, Ψ(·), maps the Fisher information matrix to a scalar objective. NLoed maximizes the determinant of the Fisher information matrix as its objective; this results in a D-optimal design which minimizes the expected confidence volume of the parameter estimates. The integer N is the number of unique input vectors considered in the design, and M is the number of observables (i.e., output channel–time point pairs). The relaxed replicate allocations, ξ_i,j, are constrained to sum to one: Inline graphic .

Currently, NLoed only implements D-optimal design, which is widely used due to its utility and efficiency. However, this approach cannot be applied to non-identifiable models (for which the Fisher information matrix may be singular, with zero determinant across all design candidates). NLoed does not currently provide objectives for these more challenging scenarios, but it can be used to probe for convergence across different model reductions or alternative design constraints in order to help overcome non-identifiability. Future versions will include more objective functions to add flexibility.

The expected Fisher information for an overall design is the sum of individual Fisher information matrices for each observation, assuming that the observations are statistically independent. (Currently, NLoed only supports optimal design for experiments with independent observations.) NLoed then computes the expected Fisher information for an overall design using the sum of individual Fisher information matrices, Inline graphic , for each input vector, x_j, and observation, y_i. The individual Fisher information matrices are defined as

This is the expected value of the square of the parametric sensitivities of the log-likelihood (i.e., variance of the score vector).²⁹ NLoed uses a chain rule decomposition to simplify the computation of the Fisher information—this approach also enables easy computation of the Fisher information for non-Gaussian distributions (see ref (29) for further details.) From the definition in eq 2, we have that η_i = f_i(x, θ), and therefore, the log-likelihood sensitivity vector can be decomposed as

The term ∂f_i(x_j, θ)/∂θ is the parametric sensitivity of the sampling statistics. This sensitivity can be determined via automatic differentiation of the user-provided model function f(·). The term Inline graphic and the expectation can be subsumed into the matrix , defined as

which can be computed analytically from the user-provided function f(·) for the observation distributions supported by NLoed—including several non-Gaussian distributions.²⁹ The Fisher information for each input and observation in a design can then be evaluated as

For numerical tractability, the design optimization problem can be further simplified in a variety of ways, depending on how the candidate input vector x_j and replicate allocations, ξ_i,j, are encoded as optimization variables in the nonlinear programming solver. NLoed offers the users ample flexibility in how the optimization problem is posed, including the option to treat optimized design variables as either continuously or discretely valued, and, in cases other than sample time selection, to choose between a relaxed sample allocation formulation or to solve for an exact design directly.

In addition to design optimization, NLoed includes several model-building and diagnostic tools, including methods for model fitting, sensitivity analysis, model simulation, data sampling, and design evaluation. These auxiliary tools are complementary to NLoed’s primary OED functionality. They facilitate incorporation of NLoed into a complete model building and experimental workflow that aligns with the user’s optimal design objectives.

3. Implementation

The NLoed library is built around two core classes: the Model class and the Design class. The Model class captures all the mathematical information specified in the model framework in eqs 1 and 2. The Model class also provides methods for model calibration and simulation and for evaluating a given design’s performance on the model instance. The Design class accepts Model instances as well as other design information and then implements and solves the design optimization problem. The Design class also provides methods for rounding a relaxed design to an exact design with a specified total sample budget. Further details on the class architecture can be found in the documentation provided on the NLoed Github repository.

3.1. Model Building

As an example, we consider the CcaS/CcaR optogenetic system described.³⁰ (This system was previously characterized mechanistically in recent work³¹ which provided a detailed photoconversion model. A closely related system was characterized previously by Olson et al.,³² in which an appropriate model structure was determined, a simplification of which is used below.) We focus on the system’s steady-state response to pulse-width modulated (PWM) green light. (Application to a complementary dynamic model is illustrated in the Supporting Information.) We describe the steady-state response by a Hill function model with normally distributed heteroskedastic observation errors

Here, the single input x is the green light level delivered during growth of the culture (as a percentage of maximal light level). The single independent sampling statistic η = μ is the observation mean. The single observable Y, assumed to be normally distributed, is the steady-state mean GFP expression of a batch culture (see the Supporting Information for details). The components of the unknown model parameter vector θ = (α_o, α, n, K) characterize basal expression, maximal induced expression, sensitivity (Hill coefficient), and half-maximal input, respectively. The variance of the GFP observations is assumed to be proportional to the square of the mean, with s serving as the proportionality constant. We were not interested in optimizing designs for estimates of s; its value was therefore fixed based on initial data (see the Supporting Information for details).

To define a model in NLoed, we first specify an expression for the function f(·) mapping inputs to sampling statistics using CasADi’s symbolic types. Code listing 1 shows this process for the model in eq 9.

In Listing 1, a log transformation of the original parameters from eq 9 is used to ensure that the parameter values remain positive during fitting. (This transformation can also alleviate issues with an ill-conditioned Fisher information matrix.) In the final line of Listing 1, a CasADi function, func, is generated to implement f(·) from eq 9. The user can then call the NLoed Model class constructor to create the NLoed model instance; this is shown in Listing 2.

For the CcaS/CcaR case study, we conducted initial experiments observing, in triplicate, individual batch cultures’ mean GFP expression in steady state under 0, 1.5, 6, 25, and 100% of the maximal green light level (15 observations in total, details in the Supporting Information). This preliminary data set was used to generate initial parameter estimates, from which NLoed’s locally optimal designs could be determined. Listing 3 details the use of the Model class’s fit() method in generating initial parameter estimates of α_o = 552.8 a.u., α = 9493.2 a.u., K = 8.5%, and n = 2.4. The variance proportionality constant was estimated independently as s = 0.06 (see the Supporting Information for details).

Beyond fitting, Model class instances can perform a number of model development tasks via available class methods:

3.1.1. Fitting Diagnostics

In addition to implementing a maximum likelihood fitting algorithm (as in Listing 3), the fit() method can generate profile likelihood-based confidence intervals for the parameter estimates and can plot visual diagnostics of parameter identifiability such as confidence contours, likelihood profiles, and profile trace projections.³³

3.1.2. Model Predictions

The predict() method allows the user to generate predictions of the means of the model’s observable outputs for a given input setting. This method can also generate prediction uncertainty intervals to quantify the effects of parameter uncertainty on predictions and can perform local parametric sensitivity analysis.

3.1.3. Design Evaluation

The evaluate() method allows the user to evaluate candidate experimental designs with respect to the given model using interpretable metrics such as the expected covariance, bias, and mean-squared error of the parameter estimates. The evaluate() method can use asymptotic or Monte Carlo-based computation to determine the design evaluation metrics.

3.1.4. Data Simulation

The sample() method complements the predict() method by simulating random experimental observations from a given experimental design, allowing the user to generate sample data for simulation studies.

3.2. Optimal Experimental Design

To generate an optimal design for the CcaS/CcaR model, we passed the previously created Model class instance to the Design constructor (Listing 4). The call to the Design constructor shown in Listing 4 includes specification of the model object, input constraints, design objective, and the nominal parameter values around which the design is to be optimized. In the first lines of Listing 4, the inp dictionary is used to define the input constraints. This dictionary specifies the bounds and the number of unique levels of the green light input that are to be allowed during the design optimization. In this case, the light level is bounded between 0.01 and 100% and designs consisting of at most four unique levels are considered. NLoed provides a variety of options for how the input levels are handled numerically and which constraints are applied; see the documentation on Github for further details.

The nominal parameter values are needed in the Design constructor call because NLoed uses the expected Fisher information—a local asymptotic approximation—to compute the design objective. The optimality of the resulting design is therefore dependent on how close the nominal values are to the unknown true parameters. It can thus be risky to optimize the design for an entire experiment based on highly uncertain nominal parameter values. This risk can be mitigated by performing a series of sequentially optimized experiments.¹⁰ In sequential design, rather than using an uncertain nominal parameter set to allocate all observations in a study, the experimenter sub-divides their planned experiment into a series of sub-experiments. In this scenario, the experimental design generated for the first sub-experiment may be sub-optimal, but even a sub-optimal experiment yields additional data. These data increase the sample size—and as parameter estimation accuracy is generally expected to increase with the square root of the sample size²⁹—each additional sub-experiment will on average yield parameter estimates that are nearer to the unknown true value. Therefore, by optimizing each sub-experiment with respect to the current best parameter estimate, the optimality of the combined data set with respect to the unknown true parameter values is expected to improve. To demonstrate this process, in Listing 4, the parameter estimates from the preliminary experiment are used as the nominal values for the design of an optimized experimental run on the CcaS/CcaR system.

Just using the parameter estimates from a previous (sub-)experiment for design optimization will improve the expected performance of the resulting design. However, even greater efficiency can be achieved by conditioning the design optimization directly on past data as well. A subsequent experiment can be most efficiently selected if it is optimized to complement the previously gathered observations with respect to the objective, rather than disregarding the information already available. This procedure is in general known as sequential experimental design. This approach results in a conditionally optimal design at each iteration of a sequential design procedure: each design is conditionally optimal on all the past data in the sense that the new design assumes that the past observations will be included in any subsequent parameter estimation. This conditional optimization is shown in Listing 4, where the initial design, design0, is passed into the Design constructor via the init field. By passing in the initial design, NLoed will select a new design that best complements the initial design in order to optimize the objective.

When the Design class is instantiated near the end of Listing 4, NLoed runs an IPOPT call to generate an optimal relaxed design, in which the optimal allocation of replicates assumes a continuum of replicate allocation weights Ξ. Optimization via IPOPT is local, which means that it can be sensitive to the initial starting point of the design. By default, NLoed uses a random initialization, which often performs well for small systems with strong identifiability. In more challenging situations, the user can provide specific starting designs to explore the effects of initialization settings. Reformulation of the optimization problem by discretization of the design space, which can introduce convexity, can also help and is supported by NLoed (see the background documentation on Github). Should the optimization fail to converge, care should be taken to check the model for structural identifiability with appropriate tools;³⁴ a structurally unidentifiable model will have a singular FIM. Discretization of the design space, as suggested above, can aid in assessing ill-conditioned and non-identifiable models.

To produce a design that can be implemented, the last line of Listing 4 uses the round() method to generate a useable design with a total of 15 replicates (over all observations, mirroring the design of the preliminary experiment). The resulting optimized exact design contained in variable design1 is shown in Table 1. Note that the optimal design, perhaps non-intuitively, focuses on the lowest part of the input range. This optimal design took NLoed 0.25 s to generate (details in the Supporting Information). As a point of comparison, performing a similar optimization with aceBayes, a Bayesian OED tool, took 19.179 s—about 80 times longer (see the Supporting Information for details). We conclude that the local approaches implemented in NLoed can be significantly faster than Bayesian approaches, making them suitable for use in cases when computational costs are limiting (e.g., extensive or rapid iteration, or for analysis of complex models). A comparison with AMIGO2, which focuses specifically on dynamic experiments, is provided in the Supporting Information. AMIGO2 is moderately faster on the example presented there, suggesting, at least in this case, that numerical integration of adjoint sensitivity equations currently outperforms the use of automatic differentiation.

Table 1. Optimal Design for the CcaS/CcaR Model.

green light input x_j	number of replicates ξ_i,j
x₁ = 0.01%	4
x₂ = 2.8%	5
x₃ = 8.1%	4
x₄ = 100.0%	2

Open in a new tab

We then executed the optimal design and re-fit the model to the combined data (preliminary and follow-up optimal). The updated parameter estimates are α_o = 525.6 a.u., α = 9876.1 a.u., K = 10.4%, and n = 2.3. To compute the size of each parameter’s approximate 95% confidence interval, we used the Model class’s evaluate() method (Listing 5, full code in the Supporting Information). In Listing 5, the evaluate() method produces a parameter covariance matrix for a combined design—including both the initial design, design0, and the optimal design, design1. The covariance matrix is then used to compute the asymptotic confidence intervals. For comparison, we generated asymptotic confidence intervals for two other cases: (i) the preliminary experiment alone and (ii) the preliminary experiment, followed by a replicate of the preliminary experiment (code provided in the Supporting Information). (The replicate initial design provides a controlled comparison for the increased sample size). Figure 1 shows the asymptotic confidence intervals for all three cases. These results show improved precision of the parameter estimates using the optimized design rather than simply replicating the original design. These improvements are primarily accrued in the two nonlinear parameters K and n, where the optimal design noticeably outperforms direct replication.

Comparison of 95% confidence interval sizes, expressed as a percentage of the parameter estimate values, between various combinations of the initial and optimal designs.

The impact of the experimental design on model performance can be visualized by the effects of uncertainty on model predictions. NLoed produces the necessary data through the Model class’s predict() method. After generating the covariance matrix in Listing 5, the matrix is then passed to the predict() method to generate model predictions and a 95% uncertainty interval for the observations. This interval accounts for observation variability and the parameter uncertainty resulting from the overall design. In this scenario, prediction improvement was marginal (results not shown), but in general, prediction uncertainty intervals can provide a useful method to compare designs. Figure 2 shows a plot of the model predictions and observation interval along with the initial and optimal data.

Model predictions and 95% observation interval for the CcaS/CcaR model. Also shown are the initial and optimal data used for fitting the model.

4. Discussion

Optimal experimental design (OED) consists of a well-established set of methodologies with a long history of success. However, it remains a challenge to translate OED methods into practice, especially for evolving experimental disciplines such as systems biology. NLoed aims to make OED more accessible to systems biologists by providing OED methods in an easy-to-use and open-source package. By combining OED methods with general model building and diagnostic algorithms, NLoed provides a complete modeling workflow. NLoed’s use of state-of-the-art automatic differentiation tools, via CasADi, improves performance and makes the package easily extensible. In summary, NLoed is an ideal tool for rapid iteration of experimental design and model building, where more computationally intensive Bayesian OED methods may not be practical.

Currently, NLoed only implements local asymptotic optimization criteria for parameter accuracy. NLoed is therefore best suited to designing large, iterative characterization experiments. In these cases, when the sample size is large, the asymptotic and local approximations are expected to perform well, especially when used for sequential design. NLoed’s focus on parameter accuracy also means that it is especially useful for precise modeling of well-understood natural systems or synthetically engineered systems where the model structure is reasonably well determined: it is better suited to precision characterization of well-studied systems as opposed to early investigation of novel ones.

In future releases, we hope to expand NLoed’s capabilities, including new design methods such as pseudo-Bayesian techniques for addressing model and parameter uncertainty.³⁵ This will improve NLoed’s ability to accommodate uncertainty, especially for early experimental work on novel systems. Regardless, in its current form, NLoed can make it easier for experimentalists to more efficiently allocate laboratory resources while also providing theoretical groups with tools to study the effects of experimental design on model identifiability.

Acknowledgments

The plasmids used in this work were generously provided by Jeffery Tabor’s group at Rice University, as described in their previous work.³⁰

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acssynbio.2c00131.

Images of the light array apparatus used in this work; standard deviation of GFP measurements plotted against the mean of the GFP measurements in each light condition for preliminary experiments; and convergence diagnostic plot generated for aceBayes design optimization (ZIP)
Experimental materials and methods; output variance; code for main text figures; implementation of dynamic models in NLoed; timing comparison with AMIGO2; and timing comparison with aceBayes (PDF)

This work was supported by a Discovery Grant from Canada’s Natural Sciences and Engineering Research Council (NSERC).

The authors declare no competing financial interest.

Supplementary Material

sb2c00131_si_001.zip^{(5.6MB, zip)}

sb2c00131_si_002.pdf^{(6.2MB, pdf)}

References

Hagen D. R.; White J. K.; Tidor T. Convergence in parameters and predictions using computational experimental design. Interface Focus 2013, 3, 20130008. 10.1098/rsfs.2013.0008. [DOI] [PMC free article] [PubMed] [Google Scholar]
Braniff N.; Ingalls B. New opportunities for optimal design of dynamic experiments in systems and synthetic biology. Curr. Opin. Syst. Biol. 2018, 9, 42–48. 10.1016/j.coisb.2018.02.005. [DOI] [Google Scholar]
Apgar J. F.; Witmer D. K.; White F. M.; Tidor T. Sloppy models, parameter uncertainty, and the role of experimental design. Mol. BioSyst. 2010, 6, 1890–1900. 10.1039/b918098b. [DOI] [PMC free article] [PubMed] [Google Scholar]
Franceschini G.; Macchietto S. Model-based design of experiments for parameter precision: State of the art. Chem. Eng. Sci. 2008, 63, 4846–4872. 10.1016/j.ces.2007.11.034. [DOI] [Google Scholar]
Kreutz C.; Timmer J. Systems biology: experimental design. FEBS J. 2009, 276, 923–942. 10.1111/j.1742-4658.2008.06843.x. [DOI] [PubMed] [Google Scholar]
Chakrabarty A.; Buzzard G. T.; Rundell A. E. Model-based design of experiments for cellular processes. Wiley Interdiscip. Rev.: Syst. Biol. Med. 2013, 5, 181–203. 10.1002/wsbm.1204. [DOI] [PubMed] [Google Scholar]
Bandara S.; Schlöder J. P.; Eils R.; Bock H. G.; Meyer T. Optimal experimental design for parameter estimation of a cell signaling model. PLoS Comput. Biol. 2009, 5, e1000558 10.1371/journal.pcbi.1000558. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ruess J.; Parise F.; Milias-Argeitis A.; Khammash M.; Lygeros J. Iterative experiment design guides the characterization of a light-inducible gene expression circuit. Proc. Natl. Acad. Sci. U.S.A. 2015, 112, 8148–8153. 10.1073/pnas.1423947112. [DOI] [PMC free article] [PubMed] [Google Scholar]
Groemping U.CRAN Task View: Design of Experiments (DoE) & Analysis of Experimental Data, 2020.
Atkinson A.; Alexander D.; Tobias R.; et al. Optimum experimental designs, with SAS; Oxford University Press, 2007; Vol. 34. [Google Scholar]
Nyberg J.; Ueckert S.; Strömberg E. A.; Hennig S.; Karlsson M. O.; Hooker A. C. PopED: an extended, parallelized, nonlinear mixed effects models optimal design tool. Comput. Methods Progr. Biomed. 2012, 108, 789–805. 10.1016/j.cmpb.2012.05.005. [DOI] [PubMed] [Google Scholar]
Bazzoli C.; Retout S.; Mentré F. Design evaluation and optimisation in multiple response nonlinear mixed effect models: PFIM 3.0. Comput. Methods Progr. Biomed. 2010, 98, 55–65. 10.1016/j.cmpb.2009.09.012. [DOI] [PubMed] [Google Scholar]
Nyberg J.; Bazzoli C.; Ogungbenro O.; Aliev A.; Leonov S.; Duffull S.; Hooker A. C.; Mentré F. Methods and software tools for design evaluation in population pharmacokinetics–pharmacodynamics studies. Br. J. Clin. Pharmacol. 2015, 79, 6–17. 10.1111/bcp.12352. [DOI] [PMC free article] [PubMed] [Google Scholar]
Clyde M. A.An object-oriented system for bayesian nonlinear design using xlisp-stat. Technical Report; University of Minnesota, 1993.
Overstall A. M.; Woods D. C. Bayesian design of experiments using approximate coordinate exchange. Technometrics 2017, 59, 458–470. 10.1080/00401706.2016.1251495. [DOI] [Google Scholar]
Overstall A. M.; Woods D. C.; Parker B. M. Bayesian optimal design for ordinary differential equation models with application in biological science. J. Am. Stat. Assoc. 2020, 115, 583. 10.1080/01621459.2019.1617154. [DOI] [Google Scholar]
Gilman J.; Walls L.; Bandiera L.; Menolascina F. Statistical design of experiments for synthetic biology. ACS Synth. Biol. 2021, 10, 1–18. 10.1021/acssynbio.0c00385. [DOI] [PubMed] [Google Scholar]
Gilman J.; Zulkower V.; Menolascina F.. Using a design of experiments approach to inform the design of hybrid synthetic yeast promoters. Computational Methods in Synthetic Biology; Springer, 2021; pp 1–17. [DOI] [PubMed] [Google Scholar]
Balsa-Canto E.; Bandiera L.; Menolascina F.. Optimal experimental design for systems and synthetic biology using amigo2. Synthetic Gene Circuits; Springer, 2021; pp 221–239. [DOI] [PubMed] [Google Scholar]
Balsa-Canto E.; Banga J. R. Amigo, a toolbox for advanced model identification in systems biology using global optimization. Bioinformatics 2011, 27, 2311–2313. 10.1093/bioinformatics/btr370. [DOI] [PMC free article] [PubMed] [Google Scholar]
Balsa-Canto E.; Henriques D.; Gábor A.; Banga J. Amigo2, a toolbox for dynamic modeling, optimization and control in systems biology. Bioinformatics 2016, 32, 3357–3359. 10.1093/bioinformatics/btw411. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bandiera L.; Gomez-Cabeza D.; Gilman J.; Balsa-Canto E.; Menolascina F. Optimally designed model selection for synthetic biology. ACS Synth. Biol. 2020, 9, 3134–3144. 10.1021/acssynbio.0c00393. [DOI] [PubMed] [Google Scholar]
Gomez-Cabeza D., Bandiera L.; Menolascina F.. Bombs.jl: A Low-Code Julia Package for the Simulation, Bayesian Inference and Optimal Experimental Design of Biomodels. 2022. https://docs.juliahub.com/BOMBs/MvNlh/0.1.2/.
Dirk J. W. D. P.; Vanrolleghem P. A.. Avoiding the finite difference sensitivity analysis deathtrap by using the complex-step derivative approximation technique. International Congress on Environmental Modelling and Software, 2006; Vol. 24.
Andersson J. A. E.; Gillis J.; Horn G.; Rawlings B.; Diehl M. CasADi – A software framework for nonlinear optimization and optimal control. Math. Program. Comput. 2019, 11, 1–36. 10.1007/s12532-018-0139-4. [DOI] [Google Scholar]
Wächter A.; Biegler L. T. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math. Program. 2006, 106, 25–57. 10.1007/s10107-004-0559-y. [DOI] [Google Scholar]
Lakatos E.; Ale A.; Kirk P. D. W.; Stumpf M. P. H. Multivariate moment closure techniques for stochastic kinetic models. J. Chem. Phys. 2015, 143, 094107. 10.1063/1.4929837. [DOI] [PubMed] [Google Scholar]
Elf J.; Ehrenberg M. Fast evaluation of fluctuations in biochemical networks with the linear noise approximation. Genome Res. 2003, 13, 2475–2484. 10.1101/gr.1196503. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fedorov V. V.; Leonov S. L.. Optimal Design for Nonlinear Response Models; CRC Press, 2013. [Google Scholar]
Schmidl S. R.; Sheth R. U.; Wu A.; Tabor J. J. Refactoring and optimization of light-switchable escherichia coli two-component systems. ACS Synth. Biol. 2014, 3, 820–831. 10.1021/sb500273n. [DOI] [PubMed] [Google Scholar]
Olson E. J.; Tzouanas C. N.; Tabor J. J. A photoconversion model for full spectral programming and multiplexing of optogenetic systems. Mol. Syst. Biol. 2017, 13, 926. 10.15252/msb.20167456. [DOI] [PMC free article] [PubMed] [Google Scholar]
Olson E. J.; Hartsough L. A.; Landry B. P.; Shroff R.; Tabor J. J. Characterizing bacterial gene circuit dynamics with optically programmed gene expression signals. Nat. Methods 2014, 11, 449–455. 10.1038/nmeth.2884. [DOI] [PubMed] [Google Scholar]
Bates D. M.; Watts D. G.. Nonlinear Regression Analysis and its Applications; Wiley: New York, 1988; Vol. 2. [Google Scholar]
Villaverde A. F.; Barreiro A.; Papachristodoulou A. Structural identifiability of dynamic systems biology models. PLoS Comput. Biol. 2016, 12, e1005153 10.1371/journal.pcbi.1005153. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schenkendorf R.; Mangold A.; Kremling M. Optimal experimental design with the sigma point method. IET Syst. Biol. 2009, 3, 10–23. 10.1049/iet-syb:20080094. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

sb2c00131_si_001.zip^{(5.6MB, zip)}

sb2c00131_si_002.pdf^{(6.2MB, pdf)}

[ref1] Hagen D. R.; White J. K.; Tidor T. Convergence in parameters and predictions using computational experimental design. Interface Focus 2013, 3, 20130008. 10.1098/rsfs.2013.0008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref2] Braniff N.; Ingalls B. New opportunities for optimal design of dynamic experiments in systems and synthetic biology. Curr. Opin. Syst. Biol. 2018, 9, 42–48. 10.1016/j.coisb.2018.02.005. [DOI] [Google Scholar]

[ref3] Apgar J. F.; Witmer D. K.; White F. M.; Tidor T. Sloppy models, parameter uncertainty, and the role of experimental design. Mol. BioSyst. 2010, 6, 1890–1900. 10.1039/b918098b. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref4] Franceschini G.; Macchietto S. Model-based design of experiments for parameter precision: State of the art. Chem. Eng. Sci. 2008, 63, 4846–4872. 10.1016/j.ces.2007.11.034. [DOI] [Google Scholar]

[ref5] Kreutz C.; Timmer J. Systems biology: experimental design. FEBS J. 2009, 276, 923–942. 10.1111/j.1742-4658.2008.06843.x. [DOI] [PubMed] [Google Scholar]

[ref6] Chakrabarty A.; Buzzard G. T.; Rundell A. E. Model-based design of experiments for cellular processes. Wiley Interdiscip. Rev.: Syst. Biol. Med. 2013, 5, 181–203. 10.1002/wsbm.1204. [DOI] [PubMed] [Google Scholar]

[ref7] Bandara S.; Schlöder J. P.; Eils R.; Bock H. G.; Meyer T. Optimal experimental design for parameter estimation of a cell signaling model. PLoS Comput. Biol. 2009, 5, e1000558 10.1371/journal.pcbi.1000558. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref8] Ruess J.; Parise F.; Milias-Argeitis A.; Khammash M.; Lygeros J. Iterative experiment design guides the characterization of a light-inducible gene expression circuit. Proc. Natl. Acad. Sci. U.S.A. 2015, 112, 8148–8153. 10.1073/pnas.1423947112. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref9] Groemping U.CRAN Task View: Design of Experiments (DoE) & Analysis of Experimental Data, 2020.

[ref10] Atkinson A.; Alexander D.; Tobias R.; et al. Optimum experimental designs, with SAS; Oxford University Press, 2007; Vol. 34. [Google Scholar]

[ref11] Nyberg J.; Ueckert S.; Strömberg E. A.; Hennig S.; Karlsson M. O.; Hooker A. C. PopED: an extended, parallelized, nonlinear mixed effects models optimal design tool. Comput. Methods Progr. Biomed. 2012, 108, 789–805. 10.1016/j.cmpb.2012.05.005. [DOI] [PubMed] [Google Scholar]

[ref12] Bazzoli C.; Retout S.; Mentré F. Design evaluation and optimisation in multiple response nonlinear mixed effect models: PFIM 3.0. Comput. Methods Progr. Biomed. 2010, 98, 55–65. 10.1016/j.cmpb.2009.09.012. [DOI] [PubMed] [Google Scholar]

[ref13] Nyberg J.; Bazzoli C.; Ogungbenro O.; Aliev A.; Leonov S.; Duffull S.; Hooker A. C.; Mentré F. Methods and software tools for design evaluation in population pharmacokinetics–pharmacodynamics studies. Br. J. Clin. Pharmacol. 2015, 79, 6–17. 10.1111/bcp.12352. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref14] Clyde M. A.An object-oriented system for bayesian nonlinear design using xlisp-stat. Technical Report; University of Minnesota, 1993.

[ref15] Overstall A. M.; Woods D. C. Bayesian design of experiments using approximate coordinate exchange. Technometrics 2017, 59, 458–470. 10.1080/00401706.2016.1251495. [DOI] [Google Scholar]

[ref16] Overstall A. M.; Woods D. C.; Parker B. M. Bayesian optimal design for ordinary differential equation models with application in biological science. J. Am. Stat. Assoc. 2020, 115, 583. 10.1080/01621459.2019.1617154. [DOI] [Google Scholar]

[ref17] Gilman J.; Walls L.; Bandiera L.; Menolascina F. Statistical design of experiments for synthetic biology. ACS Synth. Biol. 2021, 10, 1–18. 10.1021/acssynbio.0c00385. [DOI] [PubMed] [Google Scholar]

[ref18] Gilman J.; Zulkower V.; Menolascina F.. Using a design of experiments approach to inform the design of hybrid synthetic yeast promoters. Computational Methods in Synthetic Biology; Springer, 2021; pp 1–17. [DOI] [PubMed] [Google Scholar]

[ref19] Balsa-Canto E.; Bandiera L.; Menolascina F.. Optimal experimental design for systems and synthetic biology using amigo2. Synthetic Gene Circuits; Springer, 2021; pp 221–239. [DOI] [PubMed] [Google Scholar]

[ref20] Balsa-Canto E.; Banga J. R. Amigo, a toolbox for advanced model identification in systems biology using global optimization. Bioinformatics 2011, 27, 2311–2313. 10.1093/bioinformatics/btr370. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref21] Balsa-Canto E.; Henriques D.; Gábor A.; Banga J. Amigo2, a toolbox for dynamic modeling, optimization and control in systems biology. Bioinformatics 2016, 32, 3357–3359. 10.1093/bioinformatics/btw411. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref22] Bandiera L.; Gomez-Cabeza D.; Gilman J.; Balsa-Canto E.; Menolascina F. Optimally designed model selection for synthetic biology. ACS Synth. Biol. 2020, 9, 3134–3144. 10.1021/acssynbio.0c00393. [DOI] [PubMed] [Google Scholar]

[ref23] Gomez-Cabeza D., Bandiera L.; Menolascina F.. Bombs.jl: A Low-Code Julia Package for the Simulation, Bayesian Inference and Optimal Experimental Design of Biomodels. 2022. https://docs.juliahub.com/BOMBs/MvNlh/0.1.2/.

[ref24] Dirk J. W. D. P.; Vanrolleghem P. A.. Avoiding the finite difference sensitivity analysis deathtrap by using the complex-step derivative approximation technique. International Congress on Environmental Modelling and Software, 2006; Vol. 24.

[ref25] Andersson J. A. E.; Gillis J.; Horn G.; Rawlings B.; Diehl M. CasADi – A software framework for nonlinear optimization and optimal control. Math. Program. Comput. 2019, 11, 1–36. 10.1007/s12532-018-0139-4. [DOI] [Google Scholar]

[ref26] Wächter A.; Biegler L. T. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math. Program. 2006, 106, 25–57. 10.1007/s10107-004-0559-y. [DOI] [Google Scholar]

[ref27] Lakatos E.; Ale A.; Kirk P. D. W.; Stumpf M. P. H. Multivariate moment closure techniques for stochastic kinetic models. J. Chem. Phys. 2015, 143, 094107. 10.1063/1.4929837. [DOI] [PubMed] [Google Scholar]

[ref28] Elf J.; Ehrenberg M. Fast evaluation of fluctuations in biochemical networks with the linear noise approximation. Genome Res. 2003, 13, 2475–2484. 10.1101/gr.1196503. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref29] Fedorov V. V.; Leonov S. L.. Optimal Design for Nonlinear Response Models; CRC Press, 2013. [Google Scholar]

[ref30] Schmidl S. R.; Sheth R. U.; Wu A.; Tabor J. J. Refactoring and optimization of light-switchable escherichia coli two-component systems. ACS Synth. Biol. 2014, 3, 820–831. 10.1021/sb500273n. [DOI] [PubMed] [Google Scholar]

[ref31] Olson E. J.; Tzouanas C. N.; Tabor J. J. A photoconversion model for full spectral programming and multiplexing of optogenetic systems. Mol. Syst. Biol. 2017, 13, 926. 10.15252/msb.20167456. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref32] Olson E. J.; Hartsough L. A.; Landry B. P.; Shroff R.; Tabor J. J. Characterizing bacterial gene circuit dynamics with optically programmed gene expression signals. Nat. Methods 2014, 11, 449–455. 10.1038/nmeth.2884. [DOI] [PubMed] [Google Scholar]

[ref33] Bates D. M.; Watts D. G.. Nonlinear Regression Analysis and its Applications; Wiley: New York, 1988; Vol. 2. [Google Scholar]

[ref34] Villaverde A. F.; Barreiro A.; Papachristodoulou A. Structural identifiability of dynamic systems biology models. PLoS Comput. Biol. 2016, 12, e1005153 10.1371/journal.pcbi.1005153. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref35] Schenkendorf R.; Mangold A.; Kremling M. Optimal experimental design with the sigma point method. IET Syst. Biol. 2009, 3, 10–23. 10.1049/iet-syb:20080094. [DOI] [PubMed] [Google Scholar]

PERMALINK

NLoed: A Python Package for Nonlinear Optimal Experimental Design in Systems Biology

Nathan Braniff

Taylor Pearce

Zixuan Lu

Michael Astwood

William S R Forrest

Cody Receno

Brian Ingalls

Abstract

1. Introduction

2. Methods