Correction to GAMES: A dynamic model development workflow for rigorous characterization of synthetic genetic systems

Kate E Dray; Joseph J Muldoon; Niall M Mangan; Neda Bagheri; Joshua N Leonard

doi:10.1021/acssynbio.2c00082

. Author manuscript; available in PMC: 2022 May 19.

Published in final edited form as: ACS Synth Biol. 2022 Mar 10;11(4):1699–1704. doi: 10.1021/acssynbio.2c00082

Correction to GAMES: A dynamic model development workflow for rigorous characterization of synthetic genetic systems

Kate E Dray ¹, Joseph J Muldoon ^1,², Niall M Mangan ^3,⁴, Neda Bagheri ^1,^2,^4,^5,^*, Joshua N Leonard ^1,^2,^4,^6,^*

PMCID: PMC9119373 NIHMSID: NIHMS1796301 PMID: 35271255

After publication, we identified two minor mistakes in the code for the GAMES workflow. Each subtly affected our analysis of the example case study but had no effect on the GAMES workflow. Here we describe those mistakes and discuss how the resulting corrections modify the case study analysis.

Correction 1:

There are two locations in the workflow where noise is added to each data point in the training data: to generate the parameter estimation method (PEM) evaluation data, and to calculate the threshold used for the parameter profile likelihood (PPL). In both locations, the value of the noise added to each data point is selected from a distribution centered at zero with a standard deviation equal to the standard error associated with the data point (assuming, for our case study, that the data point represents the mean value for three replicates). We set the standard deviation, $σ_{S D}$ , for each data point to 0.05, so the standard error, $σ_{S E}$ , is calculated with Equation C1, where $n_{r e p l i c a t e s}$ is the number of replicates:

σ_{S E} = \frac{σ_{S D}}{\sqrt{n_{r e p l i c a t e s}}}

(Equation C1)

The original published code omitted the square root operation in Equation C1, such that the distribution from which each added noise value was drawn was slightly smaller than it would have been with the correct distribution. We corrected the code in v1.0.2 and repeated the relevant PEM evaluation and PPL simulations. We noted minor changes in the results that impact the threshold of the PPL in the example study and affected the identifiability classification of one parameter for one model in the example study. This change impacted the interpretation of this one parameter and the quantitative values of the PPL threshold for all parameters. The other qualitative interpretations remain the same, and no changes were made to the GAMES workflow.

With the correction, the PEM evaluation results are very similar to the previous results. The threshold used to define the PEM evaluation criterion remains the same (R² = 0.99), and this threshold is satisfied for all models (Figure 4c model A, Figure S5a model B, Figure S6a model C, Figure S7a model C).

Figure 4. — **(a)** Module 1 workflow for evaluating the PEM using simulated training data. A model must pass the PEM evaluation criterion before moving on to Module 2. **(b, c)** Module 1 case study for a hypothetical crTF. **(b)** Generating the PEM data. A global search of 1000 parameter sets was filtered by $χ^{2}$ with respect to the training data and the 8 parameter sets with the lowest $χ^{2}$ values were used as reference parameters to generate PEM evaluation data. For each data set, technical error was added using a noise distribution of N(0, 0.029²). Triangle data points are PEM evaluation data. **(c)** Determination of the PEM evaluation criterion. For each PEM evaluation data set, a global search with 100 randomly chosen parameter sets was used to choose 10 parameter sets to use as initial guesses for optimization. The optimized parameter sets and cost function from each of the PEM evaluation problems were used to evaluate the PEM evaluation criterion. Each parameter was allowed to vary across three orders of magnitude in either direction of the reference parameter value, except for n, which was allowed to vary across [10⁰, 10^0.6]. Results are shown only for parameter sets yielding $χ^{2}$ values within the bottom (best) 10% of $χ^{2}$ values (to the left of the pink dotted line in **Figure S2b**) achieved in the initial global search with respect to the training data (Module 1.1). Only parameter sets yielding R² ≥ 0.90 are included on the plot to more clearly show data points with R² values that exceed R²_opt. Both of these filtering strategies apply to all plots of PEM evaluation data in this tutorial.

With the correction, the main difference observed for the PPL results is that the calculated thresholds for each model are higher than in the original analysis. This change is consistent with our understanding of the PPL threshold, which is related to the extent of overfitting that is possible given a model, a training data set, and the associated measurement error. For models A (Figure 7b, Figure 8b, Figure S4a) and B (Figure 8b, Figure S5c), the increased threshold is the only substantial difference between the corrected results and previous results; all PPL shapes and parameter classifications remain the same.

Figure 7. — **(a)** Module 3 workflow for evaluating and refining parameter identifiability through the profile likelihood approach. Depending on the results of the parameter identifiability analysis, the next step is either experimental design (Module 0), model reduction (Module 0), or model comparison (Module 4). **(b–d)** Module 3 case study for a hypothetical crTF. **(b)** Application of the profile likelihood approach to the model defined in **Figure 2**. The calibrated parameter set from a parameter estimation was run with 1000 global search parameter sets, and 100 initial guesses for optimization were used as the starting point (represented in blue). Parameters were allowed to vary across three orders of magnitude in either direction of the reference parameter value, except for n, which was allowed to vary across [10⁰, 10^0.6]. An adaptive step method (**Supplementary Note 1**) was used to determine each step size. The threshold is defined as the 99% confidence interval of the practical ${χ_{d f}}^{2}$ distribution ( $Δ_{1 - α} =$ 5.1). **(c)** Plots of parameter relationships along the profile likelihood associated with k_m. We consider a range of possible values of the unidentifiable parameter (m) and plot these values against recalibrated values of other model parameters (k_m, e, n, b, k_bind). **(d)** Plots of internal model states considering a range of possible values of unidentifiable parameter m. Time courses represent the trajectory of each state variable in the model as a function of m choice. Each trajectory was generated by holding m constant at the given value and re-optimizing all other free parameters (results are from the same simulations used to plot the PPL results in b). Data are shown for these conditions: 50 ng DNA-binding domain plasmid, 50 ng activation domain plasmid, and a saturating ligand dose (100 nM).

The corrected PPL shapes and parameter classifications for model C were also in agreement with the previous results (Figure 8b, Figure S6c), with the exception of the parameter m*, which now appears practically unidentifiable—whereas previously this parameter was deemed identifiable—as the PPL reaches the threshold in the negative direction but not in the positive direction. However, this corrected result has no impact on downstream analysis because m* is still classified as identifiable for the final model (model D). The classification of m* as practically unidentifiable for model C is reasonable given that the increased PPL threshold necessitates that higher m* values be traversed when determining the PPL. As m* is a ratio between the parameters m and b, once m* reaches a sufficiently high value such that m >> b, increasing m* further has no meaningful effect on the agreement between the training data and simulated data. This interpretation explains why m* does not reach the threshold in the positive direction for model C with the correction included here.

The corrected results for model D are very similar to the previous results (Figure 8b, Figure S7c). All parameter classifications remain the same, and all parameters are identifiable. The qualitative shape of the PPL for m* is similar to the shape observed for m* in model C (with the correction), but in model D, the PPL crosses the threshold in both the negative and positive directions. This is reasonable because model D has fewer free parameters (four free parameters) than does model C (five free parameters), and therefore model D has a lower calculated PPL threshold, enabling the PPL for m* to cross the threshold in the positive direction.

Correction 2:

We also noted a minor mistake in the model D case study. For model D, an incorrect value for k_bind (the fixed value of 1 rather than the reference value of 0.05) was used to define the reference parameter set and calculate the PPL threshold. This mistake was corrected before generating the simulation results reported here. Correcting this value led to some parameter sets having higher $χ^{2} (θ_{f i t})$ values than $χ^{2} (θ_{r e f})$ values (Figure S7c) because k_bind cannot be fit to the reference parameter value for each noise realization. However, the resulting reduced model with k_bind = 1 still yields very similar agreement between the training data and simulated data (Figure S7b), which shows that fixing k_bind to 1 (and not to the reference value of 0.05, which would be unknown in a practical situation when the reference parameters do not exist) does not significantly affect the results. This phenomenon, in which some parameter sets have slightly higher $χ^{2} (θ_{f i t})$ values than $χ^{2} (θ_{r e f})$ values, was also observed in the original results for all models but to a lesser extent. In general, slightly negative values for $χ^{2} (θ_{r e f})$ - $χ^{2} (θ_{f i t})$ can be attributed to the optimization algorithm finding local minima (that have only slightly different $χ^{2}$ values than the global minimum) to define $χ^{2} (θ_{f i t})$ for some noise realizations.

Conclusions:

these corrections affected our interpretation of the example case study but had no effect on the GAMES workflow itself. The code used to define the case study example has been updated and annotated on GitHub: https://github.com/leonardlab/GAMES.

Supplementary Material

Figure S4. Additional PPL results for Model A.

(a) Evaluation of the $χ^{2}$ distribution via a simulation study. 1000 individual noise realizations were generated. Parameters were individually estimated for each of the noise realizations to calculate $χ^{2} (θ_{f i t})$ . Reference parameters were used to calculate $χ^{2} (θ_{r e f})$ . The difference between these values, $χ^{2} (θ_{r e f}) - χ^{2} (θ_{f i t})$ , represents the amount of overfitting for each noise realization. $Δ_{1 - α}$ (blue dotted line) was determined by evaluating the 99% confidence interval ( $\propto$ = 0.01) of the distribution ( $Δ_{1 - α} = 5.1)$ . (b) Three-dimensional plot of k_m, b, and m along the unidentifiability associated with m. The surface is smooth, indicating dependencies between the three parameters. The logarithm (log₁₀( $θ$ )) of each parameter is plotted.

NIHMS1796301-supplement-Figure_S4.jpg^{(1.2MB, jpg)}

Figure S5. PEM evaluation, parameter estimation, and determination of confidence threshold for Model B.

(a) PEM evaluation criterion with 1000 parameter sets in the global search and 100 initial guesses. Results are shown only for parameter sets yielding R² $\geq$ 0.90. The PEM evaluation criterion is satisfied. (b) Best fit to the training data using the calibrated parameter set. The visual inspection criterion is satisfied. Parameter values are in Supplementary Table 2. (c) Determination of the confidence threshold for PPL calculations ( $Δ_{1 - α}$ = 7.7).

NIHMS1796301-supplement-Figure_S5.jpg^{(1.6MB, jpg)}

Figure S6. PEM evaluation, parameter estimation, and determination of confidence threshold for Model C.

(a) PEM evaluation criterion with 1000 parameter sets in the global search and 100 initial guesses. The PEM evaluation criterion is satisfied. (b) Best fit to the training data using the calibrated parameter set. The visual inspection criterion is satisfied. Parameter values are in Supplementary Table 2. (c) Determination of the confidence threshold for PPL calculations ( $Δ_{1 - α}$ = 7.9).

NIHMS1796301-supplement-Figure_S6.jpg^{(1.6MB, jpg)}

Figure S7. PEM evaluation, parameter estimation, and determination of confidence threshold for Model D.

(a) PEM evaluation criterion with 1000 parameter sets in the global search and 100 initial guesses. The PEM evaluation criterion is satisfied. (b) Best fit to the training data using the calibrated parameter set. The visual inspection criterion is satisfied. Parameter values are in Supplementary Table 2. (c) Determination of the confidence threshold for PPL calculations ( $Δ_{1 - α}$ = 7.0).

NIHMS1796301-supplement-Figure_S7.jpg^{(1.6MB, jpg)}

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure S4. Additional PPL results for Model A.

(a) Evaluation of the $χ^{2}$ distribution via a simulation study. 1000 individual noise realizations were generated. Parameters were individually estimated for each of the noise realizations to calculate $χ^{2} (θ_{f i t})$ . Reference parameters were used to calculate $χ^{2} (θ_{r e f})$ . The difference between these values, $χ^{2} (θ_{r e f}) - χ^{2} (θ_{f i t})$ , represents the amount of overfitting for each noise realization. $Δ_{1 - α}$ (blue dotted line) was determined by evaluating the 99% confidence interval ( $\propto$ = 0.01) of the distribution ( $Δ_{1 - α} = 5.1)$ . (b) Three-dimensional plot of k_m, b, and m along the unidentifiability associated with m. The surface is smooth, indicating dependencies between the three parameters. The logarithm (log₁₀( $θ$ )) of each parameter is plotted.

NIHMS1796301-supplement-Figure_S4.jpg^{(1.2MB, jpg)}

Figure S5. PEM evaluation, parameter estimation, and determination of confidence threshold for Model B.

(a) PEM evaluation criterion with 1000 parameter sets in the global search and 100 initial guesses. Results are shown only for parameter sets yielding R² $\geq$ 0.90. The PEM evaluation criterion is satisfied. (b) Best fit to the training data using the calibrated parameter set. The visual inspection criterion is satisfied. Parameter values are in Supplementary Table 2. (c) Determination of the confidence threshold for PPL calculations ( $Δ_{1 - α}$ = 7.7).

NIHMS1796301-supplement-Figure_S5.jpg^{(1.6MB, jpg)}

Figure S6. PEM evaluation, parameter estimation, and determination of confidence threshold for Model C.

(a) PEM evaluation criterion with 1000 parameter sets in the global search and 100 initial guesses. The PEM evaluation criterion is satisfied. (b) Best fit to the training data using the calibrated parameter set. The visual inspection criterion is satisfied. Parameter values are in Supplementary Table 2. (c) Determination of the confidence threshold for PPL calculations ( $Δ_{1 - α}$ = 7.9).

NIHMS1796301-supplement-Figure_S6.jpg^{(1.6MB, jpg)}

Figure S7. PEM evaluation, parameter estimation, and determination of confidence threshold for Model D.

(a) PEM evaluation criterion with 1000 parameter sets in the global search and 100 initial guesses. The PEM evaluation criterion is satisfied. (b) Best fit to the training data using the calibrated parameter set. The visual inspection criterion is satisfied. Parameter values are in Supplementary Table 2. (c) Determination of the confidence threshold for PPL calculations ( $Δ_{1 - α}$ = 7.0).

NIHMS1796301-supplement-Figure_S7.jpg^{(1.6MB, jpg)}

PERMALINK

Correction to GAMES: A dynamic model development workflow for rigorous characterization of synthetic genetic systems

Kate E Dray

Joseph J Muldoon

Niall M Mangan

Neda Bagheri

Joshua N Leonard

Correction 1:

Figure 4. Evaluate parameter estimation method.

Figure 7. Assess parameter identifiability.

Figure 8. Refinement of parameter identifiability using experimental design and model reduction.

Correction 2:

Conclusions:

Supplementary Material

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Correction to GAMES: A dynamic model development workflow for rigorous characterization of synthetic genetic systems

Kate E Dray

Joseph J Muldoon

Niall M Mangan

Neda Bagheri

Joshua N Leonard

Correction 1:

Figure 4. Evaluate parameter estimation method.

Figure 7. Assess parameter identifiability.

Figure 8. Refinement of parameter identifiability using experimental design and model reduction.

Correction 2:

Conclusions:

Supplementary Material

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases