Uncovering specific mechanisms across cell types in dynamical models

Adrian L Hauber; Marcus Rosenblatt; Jens Timmer

doi:10.1371/journal.pcbi.1010867

. 2023 Sep 13;19(9):e1010867. doi: 10.1371/journal.pcbi.1010867

Uncovering specific mechanisms across cell types in dynamical models

Adrian L Hauber ^1,^2,^3,^*, Marcus Rosenblatt ^1,², Jens Timmer ^1,^2,³

Editor: Kiran R Patil⁴

PMCID: PMC10519600 PMID: 37703301

Abstract

Ordinary differential equations are frequently employed for mathematical modeling of biological systems. The identification of mechanisms that are specific to certain cell types is crucial for building useful models and to gain insights into the underlying biological processes. Regularization techniques have been proposed and applied to identify mechanisms specific to two cell types, e.g., healthy and cancer cells, including the LASSO (least absolute shrinkage and selection operator). However, when analyzing more than two cell types, these approaches are not consistent, and require the selection of a reference cell type, which can affect the results. To make the regularization approach applicable to identifying cell-type specific mechanisms in any number of cell types, we propose to incorporate the clustered LASSO into the framework of ordinary differential equation modeling by penalizing the pairwise differences of the logarithmized fold-change parameters encoding a specific mechanism in different cell types. The symmetry introduced by this approach renders the results independent of the reference cell type. We discuss the necessary adaptations of state-of-the-art numerical optimization techniques and the process of model selection for this method. We assess the performance with realistic biological models and synthetic data, and demonstrate that it outperforms existing approaches. Finally, we also exemplify its application to published biological models including experimental data, and link the results to independent biological measurements.

Author summary

Mathematical models enable insights into biological systems beyond what is possible in the wet lab alone. However, constructing useful models can be challenging, since they both need a certain amount of complexity to adequately describe real-world observations, and simultaneously enough simplicity to enable understanding of these observations and precise predictions. Regularization techniques were suggested to tackle this challenge, especially when building models that describe two different types of cells, such as healthy and cancer cells. Typically, both cell types have a large portion of biological mechanisms in common, and the task is to identify the relevant differences that need to be included into the model.

For more than two types of cells, the existing approaches are not readily applicable, because they require defining one of the cell types as reference, which potentially influences the results. In this work, we present a regularization method that is independent from the choice of a reference. We demonstrate its working principle and compare its performance to existing approaches. Since we implemented this method in a freely available software package, it is accessible to a broad range of researchers and will facilitate the construction of useful mathematical models for multiple types of cells.

I. Introduction

Mechanistic modeling by means of ordinary differential equations (ODEs) has become a wide-spread method to understand and discover systemic behavior and dynamic information processing of complex biological systems. Along with the development of experimental techniques such as quantitative high-throughput measurements, these mathematical models tend to become more and more complex with hundreds of parameters that have to be calibrated to match experimental data. This manifests in (i) more elaborate biological questions that can only be addressed by taking into account more biological components and corresponding mechanisms [1–3], (ii) accounting for broad ranges of input doses and time scales to enlighten the full dynamical information [4–7], but also (iii) the need of such models to be valid across multiple biological systems, e.g., different cell types, model organisms, or patients [8], which roughly multiplies the number of involved model parameters by the number of systems.

In the setting of n cell types, a typical assumption is that their dynamics can be described by ODE systems with an identical structure but potentially different parameter values, e.g., accounting for mutations [9] or copy number variations [10] between different cell types. The challenge is then to cluster the cell types into groups that share the same value for a certain parameter which reduces the overall number of parameters. From the opposite perspective, the task of balancing model complexity for ODE models of n related biological systems, e.g., cell types, can also be understood as the task of determining which of the involved parameters need to be specific to the cell type to, both, explain experimental observations and keeping model complexity as low as possible. This parameter selection is related to the general topic of feature selection or model discrimination [11], and was recently solved for the case n = 2: [12] transferred the concept of the LASSO (least absolute shrinkage and selection operator) for regression [13] to the field of parameter estimation in ODE models and employed an optimization strategy outlined in [14]. They adapted Matlab’s trust-region optimizer lsqnonlin to be capable of handling the discontinuous derivative occurring in the involved L₁-norm at zero. Following on this study, [15] compared different regularization terms that include sparsity in the system with the interesting result of the L_0.8-penalization outperforming both the classical L₁ approach and the elastic net, i.e., a regularization with a combination of L₁ and L₂ norms, with respect to reliability of detecting sparsity and optimization performance.

For the case of n>2 cell types, the methodology of [12] and [15] can also be applied. Since the regularization term is no longer symmetric with respect to changing the labeling of the cell types, the choice of the reference cell type potentially influences the result of the regularization. To resolve this asymmetry, we propose to employ a regularization function that includes penalty terms for differences in fold change parameters between all n(n−1)/2 possible pairs of cell types. This corresponds to the regularization applied in the clustered LASSO that was proposed for regression modeling [16]. In the context of mechanistic ODE models, it allows, e.g., clustering of cell types into groups that share identical parameters, and thereby enables the discovery of any kind of sparsity structure in the set of parameters. When sensible groups of parameters can be predefined by including prior knowledge, the grouped LASSO [17] can be applied. However, in the setting of different healthy or cancer cell-lines, which typically include recurrent mutations [18], the flexible identification of sparsity enabled by our approach is essential. Also, it is not necessary to define parameter groups beforehand. An overview of the applicability of the different methods is provided in Table 1.

Table 1. Examples of applicability of regularization methods for different parameter subgroup structures.

Description	Cell Type 1 (Reference)	Cell Type 2	Cell Type 3	Suitable Method
Subgroup including reference	1	1	2	LASSO, Clustered LASSO
No subgroups	1	2	3	Group LASSO, Clustered LASSO
Subgroup excluding reference	1	2	2	Clustered LASSO

Open in a new tab

Within this publication, we provide a systematic approach to detect and quantify cell type-specific parameters and thereby enable a statistically sound reduction of the remaining non-specific parameters in ODE models. The main goal of inferring cell type-specific and non-specific parameters is to identify mechanisms that are different between related biological systems and those that are shared across them. We demonstrate the necessary adaptions to the optimization algorithm and model selection to incorporate the symmetric regularization function into the framework of ODE modeling. Finally, we provide an assessment of the performance of the method we propose, and apply it to biological data.

II. Problem statement

Parameter estimation in dynamical systems

Biochemical pathway models can be formulated as dynamical systems by means of ordinary differential equations (ODEs) which are based on a priori knowledge about underlying mechanisms:

\dot{\vec{x}} (t) = \vec{f} (\vec{x} (t), u (t), \vec{p})

Such models typically comprise unknown but constant parameters $\vec{p}$ , which represent, e.g., reaction rate constants, or initial conditions of the dynamical system. The biochemical species are contained in the state vector $\vec{x}$ . Experimental perturbations are incorporated through the input u(t). Maximum likelihood estimation combined with high performance numerical optimization methods provides a statistically sound and efficient way to infer the values of parameters from data {y*_i,t} with normally distributed errors $ϵ_{i, t} \sim N ({0 {, σ}_{i, t}}^{2})$ . The model states are linked to predictions of experiments y_i via the observation function:

y_{i} (t, \vec{p}) = g_{i} (\vec{x} (t, \vec{p}) \vec{, p}) + ϵ_{i, t}

Model calibration is equivalent to minimization of

χ^{2} (\vec{p}) ≔ \sum_{i, t} \frac{{[{y^{*}}_{i, t} - y_{i} (t, \vec{p})]}^{2}}{2 {σ_{i, t}}^{2}},

(1)

where all parameters are usually estimated on the logarithmic scale, rendering them strictly positive.

Multiple cell types as related biological systems

A common application for regularization in systems biology is the identification of cell-type specific parameters in ODE models [9,12]. In practice, such a model is constructed and calibrated with the data of only one cell type initially, which we will regard to as the reference cell type. Typically, one assumes that the other cell types share at least the model structure with the reference cell type, if not exhibit identical behavior. Therefore, the starting point for modeling of the other cell types are copies of the model for the reference cell type, which, however, are allowed to comprise different parameter values for the different cell types. When a priori knowledge on common parameter values, e.g., the time scale of a protein degradation is available, it is also possible to allow only a subset of parameters to be specific to the cell types [19].

Let $p_{i}^{(j)}, j = 1,2, \dots, n$ denote mechanistically equivalent parameters in n models for n different cell types. For example, $p_{i}^{(j)}$ might have one specific value in wild-type cells, and a different value in mutated cells. Throughout this paper, we will denote the reference cell type with j = 1. Consequently, the cell type-specific parameters read $p_{i}^{(j)} = p_{i}^{(1)} {\tilde{r}}_{i}^{(j)}$ , where ${\tilde{r}}_{i}^{(j)}$ represents the fold change that relates cell type j to the reference for parameter i, and ${\tilde{r}}_{i}^{(1)} = 1$ . On the logarithmic scale, the transformation that relates the cell type-specific parameters to those of the reference cell type reads

\log p_{i}^{(j)} = \log p_{i}^{(1)} + r_{i}^{(j)}, r_{i}^{(j)} = \log {\tilde{r}}_{i}^{(j)} .

(2)

LASSO regularization for the case of n = 2 cell types

In systems biology, a common application of regularization is the identification of shared mechanisms or mutations in a biochemical reaction network across multiple cell types. The term regularization refers to amending the objective function L by an additional function $ν (\vec{r})$ :

L (\vec{p}, \vec{r}, λ) = χ^{2} (\vec{p}, \vec{r}) + λ ν (\vec{r}),

(3)

where λ is the regularization strength, which is a priori an unknown constant. To find common mechanisms across multiple cell types, LASSO regularization can be applied with penalization of deviations from zero of the L₁ norm of the logarithmized fold change parameters (Fig 1A; [12]), i.e.,

ν (\vec{r}) = \sum_{i, j} | r_{i}^{(j)} |,

(4)

which has proven to be useful in the context of (logical) ODE models [9,20–24]. Further, it needs to be emphasized that in the context of ODE models with log-transformed parameters, the L₁ norm has been reported as suboptimal due to the alignment of equal penalization manifolds with manifolds of equal likelihood induced by the model structure, and employing an L_q pseudonorm with q = 0.8 was suggested [15].

Fig 1 — **(A)** Plots of the regularization function for the standard LASSO approach, i.e., penalization of logarithmized fold-change parameter values different from zero, and penalization landscapes for the two-dimensional case. **(B)** Regularization function for the symmetric penalization of fold-change differences via the clustered LASSO, evaluated each for the L₁ and L_0.8 penalizations.

The challenge of n>2 cell types

When n>1 cell types are analyzed, the problem is not symmetric w.r.t. the choice of the reference cell type anymore, because by means of Eq (4), only deviations from the reference are penalized. Deviations between pairs of the n−1 non-reference cell types remain unaffected by the regularization, i.e., shared mechanisms between cell types that do not include the reference cannot be detected. Therefore, the result of regularization potentially depends on the a priori choice of the reference cell type.

Additionally, the objective function value at a given point in parameter space depends on the choice of the reference cell type. Consider the simple example of three cell types differing in their values of p at a point in parameter space that corresponds to log p taking the values 0.5, 0.75, and 1, in the cell type 1,2, and 3, respectively. Using cell type 1 or 3 as a reference, the contribution to the objective function at this point is |0.25|^q+|0.5|^q (Eq 4), while for cell type 2 as a reference, it is only 2|0.25|^q. Since the regularization is stronger in the former case, it is also more likely to alter the position of the optimum induced by the data contribution (Eq 3). Therefore, the endpoint of regularized optimization and also the resulting clusters potentially depend on the choice of the reference cell type.

III. Methods

Symmetric penalization of fold-changes

The regularization function in Eq (4) induces a dependence on the choice of the reference cell type, because it only penalizes fold-changes that represent pairwise differences between each cell type and the reference cell type. Therefore, we propose to employ the clustered LASSO regularization regularization function

v (\vec{r}) = {\sum_{i, j < k} | r_{i}^{(j)} - r_{i}^{(k)} |}^{q},

(5)

taken from [16] and adapted to the setting of differential equation modeling. In this way also the pairwise differences are penalized and symmetry is reassured. In some sense, this can be interpreted as iterating over all possible reference cell types and summing up the penalty terms. In addition, the chosen regularization function (Eq 5) employs L_q penalization to facilitate optimization in the setting of ODE models with log-transformed parameters (Fig 1B). Note, that the penalization introduced by Eq (4) is included here through the terms with j, k = 1, since ${\tilde{r}}_{i}^{(1)} = 1$ , i.e., $\log {r_{i}}^{(1)} = 0$ and q = 1. With the proposed regularization function, clusters of any size within the n cell types that share the same parameter value are promoted, implying that they for example exhibit the same mutation or are governed by the same mechanism.

We propose to use the L_0.8 pseudonorm also with the clustered LASSO regularization function, since firstly it was found to be a suitable heuristic for the choice of q in the context of standard LASSO regularization [15], and secondly our analysis indicates a superior performance compared to q = 1 (Section IV). In contrast to [16], we employ the same regularization strength for all penalization terms corresponding to one parameter p_i, because the problem is completely symmetric w.r.t. the choice of the reference cell-line, and hence, penalization of fold-change parameters and differences of such must be treated equally.

Due to this symmetry, the choice of the reference cell-line has no effect on the outcome of the regularization procedure. Because defining a reference cell-line is still beneficial for technical reasons, it will be denoted as the technical reference in the following. For general limitations of the clustered LASSO, see [16].

Adaptations to the optimization algorithm

Optimization algorithms, such as Matlab’s lsqnonlin, frequently require the objective function L to be formulated as a sum of squared residuals. In terms of data contribution (Eq 1), this reads $L (\vec{p}, \vec{r}, λ) = {\sum_{i} r e s}_{i}^{2}$ . In our case of clustered LASSO regularization (Eq 5), an additional $(\binom{n}{2}) = n (n - 1) / 2$ residuals need to be considered, analogously to the case of the standard LASSO [12]:

{r e s}_{m} = \sqrt{{λ | r_{i}^{(j)} - r_{i}^{(k)} |}^{q}}

The so-called sensitivities sres_lm = ∂res_m/∂p_l represent the rate of change of a residual w.r.t. a certain parameter, which is a necessary value to be handed over to the optimizer and is employed during each step of the optimization. Let $r_{i}^{(j)}$ be the l-th and $r_{i}^{(k)}$ be the n-th parameter of the model. The sensitivities associated with the regularization residual res_m then read:

{s r e s}_{l m} = - {s r e s}_{o m} = \frac{q}{2} \sqrt{{λ | r_{i}^{(j)} - r_{i}^{(k)} |}^{q - 2}} s g n (r_{i}^{(j)} - r_{i}^{(k)}) f o r r_{i}^{(j)} - r_{i}^{(k)} \neq 0

Optimality criterion in presence of regularized fold-change differences

Without regularization, the process of optimization can be considered complete when $\vec{\nabla} χ^{2} (\vec{p}) = 0$ , i.e., the objective function landscape induced by the data has no slope at the current parameter vector $\vec{p}$ . When optimizing with the regularization term for symmetric penalization of fold-changes (Eq 5), this criterion on the level of the individual parameters must be extended to

{\vec{\nabla}}_{\vec{p}} χ^{2} (\vec{p}, \vec{r}) = 0 (6.1) a n d e i t h e r

{[{\vec{\nabla}}_{\vec{r}} L (\vec{p}, \vec{r})]}_{l - n} = 0 f o r | r_{i}^{(j)} - r_{i}^{(k)} | > 0 (6.2) o r

| {[{\vec{\nabla}}_{\vec{r}} χ^{2} (\vec{p}, \vec{r})]}_{l - n} | < λ | {[\vec{\nabla} ν (\vec{r})]}_{l - n} | f o r | r_{i}^{(j)} - r_{i}^{(k)} | = 0 (6.3),

where the subscript l-n after denotes the gradient in the direction of $\vec{e_{l}} - \vec{e_{n}}$ , and $\vec{e_{n}}$ being the unit vector to the n-th parameter axis.

Criterion 6.1 is the extension of the optimality criterion without regularization to an objective function that also depends on the fold-change parameters $\vec{r}$ , for which optimality is determined by one of the following two criteria: Either the slope of the objective function induced by the data in the direction of $r_{i}^{(j)} - r_{i}^{(k)}$ , and the slope of the regularization function in that direction exactly compensate each other at one point in parameter space, which is represented by criterion 6.2, or, the regularization function outweighs the data contribution in a point where two fold-change parameters are equal (Criterion 6.3).

The gradient of the regularization function (Eq 5) can be calculated:

{[\vec{\nabla} ν (\vec{r})]}_{l} = q {| r_{i}^{(j)} - r_{i}^{(k)} |}^{q - 1} s g n (r_{i}^{(j)} - r_{i}^{(k)}) f o r r_{i}^{(j)} - r_{i}^{(k)} \neq 0 .

For $r_{i}^{(j)} - r_{i}^{(k)} = 0$ , the above expression diverges, which would introduce an optimum regardless of the data contribution $χ^{2} (\vec{p}, \vec{r})$ . To prevent optimization from getting stuck at this spurious optimum, gradients are evaluated not at this singularity, but at ϵ = 10⁻¹⁰ instead, and all $| r_{i}^{(j)} - r_{i}^{(k)} | < ϵ$ are considered zero [15].

Implementation of the optimality criterion

The implementation of criterion 6.3 requires special attention: The termination of optimization due to arriving in a manifold where two fold-change parameters are equal and the regularization dominates the total objective function gradient (Criterion 6.3) can be implemented by manipulation of the sensitivity matrix such that the next optimization step does not change the value of $r_{i}^{(j)} - r_{i}^{(k)}$ . For standard LASSO regularization, this can be ensured by setting all sensitivities corresponding to the involved fold-change parameter to zero [12]. However, for the symmetric penalization of fold-changes, this approach is not suitable, because it would terminate optimization prematurely: If at any point during optimization, the case $r_{i}^{(j)} - r_{i}^{(k)} = 0, r_{i}^{(j)} \neq 0$ occurs, the next step would not change the values of either $r_{i}^{(j)}$ or $r_{i}^{(k)}$ , which is also true for all subsequent steps. Therefore, it would not be possible to ever reach the point $r_{i}^{(j)} = r_{i}^{(k)} = 0$ , which should be the ultimate optimization endpoint for λ→∞.

We propose to employ an alternative method that ensures that optimization is terminated correctly in the setting of symmetric penalization of fold-changes. While it is a necessary condition to set the sensitivities corresponding to the regularization residual and $r_{i}^{(j)}$ and $r_{i}^{(k)}$ to the same value, this value does not have to be zero. Instead, we propose to use the mean of sensitivities corresponding to $r_{i}^{(j)}$ and $r_{i}^{(k)}$ for every residual, i.e.,

{s r e s}_{l m} \to m e a n ({s r e s}_{l m}, {s r e s}_{n m}) \forall m a n d

{s r e s}_{n m} \to m e a n ({s r e s}_{l m}, {s r e s}_{n m}) \forall m,

which avoids zero-valued sensitivities wherever possible, and employs information from both individual sensitivities. We compared the performance to using the maximum of absolute values of sensitivities and found that using the mean resulted in better performance (S1 Text).

On the other hand, if at any point during optimization, criterion 6.3 is not fulfilled anymore, because the data contribution to the gradient indicates a step away from $| r_{i}^{(j)} - r_{i}^{(k)} | = 0$ , the sensitivity corresponding to the respective regularization residual res_m* is set to zero, to allow the exploration of alternative optima:

{s r e s}_{l m^{*}} \to 0

{s r e s}_{n m^{*}} \to 0

Optimization step truncation

Due to the discontinuity in the derivative of the L_q regularization terms at sign-changes of the fold-change parameters, optimization step truncation was suggested by [12] to enable efficient optimization. In the setting of symmetric penalization of fold-changes, we truncate optimization steps to prevent sign changes also in all $r_{i}^{(j)} - r_{i}^{(k)}$ . If such a sign change would occur, we make a step directly to $r_{i}^{(j)} - r_{i}^{(k)} = 0$ instead, where the optimality criterion discussed above is evaluated before a subsequent step is performed.

Selection of the parsimonious model

Regularized optimization promotes sparsity, because $r_{i}^{(j)} - r_{i}^{(k)} = 0$ minimizes the penalty induced by the regularization function (Eq 5) and corresponds to the effect encoded by parameter p_i to be identical between cell type j and k, which leads to a reduced number of degrees of freedom in the model. On the other hand, the data contribution to the objective function will almost over-fit finite amounts of data to maximize the goodness-of-fit and promote $r_{i}^{(j)} - r_{i}^{(k)} \neq 0$ . Depending on the regularization strength λ, both the data and the regularization contribution to the objective function (Eq 3) determine the optimal value for λ. Finding a value for the regularization strength that balances model parsimony with goodness-of-fit is therefore crucial for using regularization to construct useful mathematical models.

A two-step model selection approach was proposed to identify the optimal regularization strength λ* [12]. First, optimization of the regularized objective function is performed for a discrete set of regularization strengths ranging usually over several orders of magnitude. With the model structure constrained to the clusters $r_{i}^{(j)} - r_{i}^{(k)} = 0$ identified in the first step, the objective function is then optimized a second time without regularization for each λ, to obtain unbiased parameter estimates. Finally, for each λ, a statistical test, e.g., a likelihood ratio test, is performed w.r.t the unregularized objective function $L (\vec{p}, \vec{r}, λ = 0)$ . The optimal regularization strength is then given by the largest value for which the constrained model is not rejected to be consistent with the data by the statistical test.

To evaluate the statistical test, usually the number of degrees of freedom in the two alternative models must be taken into account. For the standard LASSO regularization, these are the number of fitted parameters minus the number of fold-change parameters that are equal to zero. For the symmetric penalization of fold-changes applied here, it is important to correctly take into account the case $r_{i}^{(j)} - r_{i}^{(k)} = 0$ . This can occur for $r_{i}^{(j)} \neq 0$ and $r_{i}^{(k)} \neq 0$ , which corresponds to reduction in the number of degrees of freedom by one, or alternatively, when $r_{i}^{(j)} = r_{i}^{(k)} = 0$ . When counting the number of fold-changes and differences of fold-changes that are equal to zero, the latter case should reduce the number of degrees of freedom by two only, i.e., the number of degrees of freedom is given by

m_{λ} = # r_{i}^{(1)} - # {(r_{i}^{(1)} = 0)}_{λ} - # {(r_{i}^{(j)} - r_{i}^{(k)} = 0 & r_{i}^{(j)} \neq 0)}_{λ} .

In the following, we use a likelihood ratio test with α = 0.05, which follows the work of Steiert et al. The test statistic then reads

D (λ) = L (\vec{p}, \vec{r}, λ) - L (\vec{p}, \vec{r}, λ = 0),

which, according to Wilks’ theorem [25], is distributed as a chi-squared distribution with m_λ degrees of freedom.

IV. Discussion

We implemented routines for the LASSO regularization with symmetric penalization of fold-changes into the open-source modeling environment for dynamical systems, Data2Dynamics [26]. It also includes routines for the standard LASSO regularization [12], to which we compare our method. Our implementation is applicable to all classes of models and data that can be implemented into Data2Dynamics.

Application to a toy model with simulated data

Model description

To showcase our method of symmetric penalization of fold-changes, we apply it to the following toy model that depicts the exponential decay of a species x with the rate constant p (Fig 2A):

\dot{x} (t) = - p x (t) .

Fig 2 — **(A)** Schematic representation of the model structure. Each arrow represents a degradation reaction in one of the three cell types. **(B)** Model parameter values used for simulation, i.e., the kinetic rate constants and initial concentrations used in the three cell types. **(C)** A typical data realization (means and error bars) with un-regularized model fits (lines). **(D)** Objective function landscapes with the regularized best-fit parameter vector (red dot) for different regularization strengths λ. Square brackets indicate a significant decrease in likelihood in terms of a likelihood ratio test.

For simplification, we assumed x(t = 0) = 1 for the initial concentration and a direct observation of the state x, i.e., g(x(t),p) = x(t), such that p is the only remaining free parameter. As a ground truth, we chose three cell types with a value for log p of -1.5 for cell type 1, -1.3 for cell type 2 and -1.2 for cell type 3 (Fig 2B), translating to the fold change parameters being r⁽²⁾ = 0.2, r⁽³⁾ = 0.3. We simulated data simulated with normally distributed experimental errors with log σ_t = -1.3 (Fig 2C). The parameter values were chosen such that p has similar values in cell type 2 and 3, which are both clearly different from the value in cell type 1, and can be thought of as a common mutation in cell types 2 and 3. For illustrative purposes, ⁽²⁾ and ⁽³⁾ were chosen to differ slightly.

Approach

Next, the toy model was fitted to the simulated data with the aim of retrieving the true parameter values stated above. We compared the application of three different regularization approaches: (i) Standard LASSO regularization, (ii) the proposed symmetric penalization of fold-changes with the L_0.8 pseudonorm, and (iii) with an L₁ norm instead. Each scenario was implemented with cell type 1 as the technical reference to be able to compare the results. For reasons of simplicity, we assume the true value of p in the technical reference cell type is known, so that the resulting parameter space has only two dimensions.

For five values of λ, the objective function was shown in the r⁽²⁾−r⁽³⁾-plane together with the respective optimization endpoints after each step of increasing λ (Fig 2D). Depending on the regularization strength λ, the total objective function changes from being dominated by the data through χ², to mainly depicting the regularization function $ν (\vec{r})$ (Eq 3). Note that in this visualization of the optimization path, each point on the diagonal r⁽²⁾−r⁽³⁾ = 0 corresponds to a common mutation in cell types 2 and 3 as used for simulation, apart from the origin r⁽²⁾ = r⁽³⁾ = 0 where all three cell types would comprise the same parameter value.

Results

It can be observed that independent of the regularization approach, for low regularization strengths, the optimization end points are close to but not exactly on the diagonal r⁽²⁾−r⁽³⁾ = 0, indicating that the model fit overly adjusts to the specific data realization. Note that the values of the parameters cannot be directly compared with their true values, since they are biased through the regularization.

For the standard LASSO regularization, the total objective function changes towards a sloped surface with a gradient pointing towards the coordinate origin (Fig 2D, left panel). Consequently, even though starting out closely, optimization end points only reach the diagonal at the origin where r⁽²⁾ = r⁽³⁾ = 0. This indicates that with this regularization approach, only the outcome of all cell types being equal can be found, which, however, is to be rejected by the likelihood ratio test. Therefore, with standard LASSO regularization, the parsimonious model represents different mutations in cell type 2 and 3.

Regularization with symmetric penalization of fold-changes leads to a different result: Due to the L_0.8 pseudonorm of the differences between the fold-change parameters, the additional gradient induced by the regularization function implies a curved path towards the diagonal r⁽²⁾−r⁽³⁾ = 0 (Fig 2D, right panel). Therefore, in contrast to the standard LASSO regularization, the equal mutation in cell types 2 and 3 is discovered at λ = 32, before ultimately the optimization end point arrives at the coordinate origin. The likelihood ratio test correctly identifies the former option as the optimal parsimonious model in agreement with the model used to simulate the data.

When employing the L₁ norm instead (S1 Fig), the additional gradient induced by the regularization function also implies a path towards the diagonal r⁽²⁾−r⁽³⁾ = 0, but it is not curved as for the L_0.8 pseudonorm and therefore longer. This renders the use of the L₁ norm less efficient for identifying common mutations between cell types. In the presented example, the optimization end point reaches the diagonal not until larger regularization strength of λ = 140.

Application to a biological model with simulated data

Model description

To systematically assess the performance of the symmetric penalization of fold-changes also in a realistic setting, we employ the model of [27] for information processing at the erythropoietin (Epo) receptor, where a mathematical model was established for Epo-receptor dynamics upon ligand binding and calibrated with experimental data, including time-resolved dose-response measurements. This model, which is part of the collection of benchmark models for dynamical modeling of intracellular processes [28], includes six biochemical species, four observables with 85 data points and 16 parameters in total (Fig 3A).

Fig 3 — **(A)** A schematic representation of the structure of the model used in [27]. **(B)** A typical data realization (dots) with model fits before regularization (lines and shaded areas). **(C)** Parameter values used for simulation of the data for the five cell types. **(D)** Schematic representation of the simulation study work-flow. **(E)** The number of times each individual fold-change parameter and difference of fold-change parameters is constrained to zero in the parsimonious model is plotted as a histogram for the standard LASSO approach (blue) and the symmetric penalization of fold-change differences (red). A gray bin background indicates the true values that were used for simulation of the data.

Approach

We simulated data in the same configuration as the actual biological data, i.e., the same observables, time points/doses, measurement errors as follows: For the technical reference cell type, we employed the best fit parameters of the original publication. We simulated 100 data sets in three additional cell types (Fig 3B) while varying three of the model parameters, which depicts possible mutations (Fig 3C). We applied regularization with symmetric penalization of fold-changes to all 100 data sets to find common mutations in the five cell types related to the parameters listed in Fig 3C. We fixed the rate constant k_ex and the offset parameter to 10⁻⁵ because they were practically non-identifiable with the experimental data set included in Data2Dynamics. Because the parameters of the observation function are typically not related to the investigated biological system, they are excluded from regularization. We then counted how often a fold-change parameter or a difference of fold-change parameters is correctly identified as being compatible with zero in the selected parsimonious model, and repeated the whole process with the standard LASSO regularization for comparison (Fig 3D).

Results

For regularization with symmetric penalization of fold-changes, the overall number of correctly identified fold-change parameters and differences of those is very high (Fig 3E). At the same time, there are virtually no false positive results, which is indicated by the absence of cell types identified as having a parameter in common when it is not true (Fig 3C). In comparison to the standard LASSO regularization, the proposed method has identical performance w.r.t. the fold-change parameters that relate to the technical reference cell type in this setting. The differences of fold-change parameters equal to zero that represent pairs of cell types with the same mutation are only found using regularization with symmetric penalization of fold-changes. We can observe that in some cases, it is more challenging to identify a fold-change parameter or a difference of them as being zero, e.g., for the difference between the initial value for Epo in cell types 2 and 3. We interpret this as an effect of all parameters being regularized with the same strength λ, while not all parameters have the same influence on model simulations due to the non-linearity of the models, which however also affects the results of the standard LASSO regularization.

Application to experimental data

Model description

To investigate the effect of ligand addiction, which drives tumor growth, [28] developed a mechanistic model that comprises multiple signal transduction pathways including ErbB, IGF-1R and Met signaling [19]. This model was calibrated on experimental data for seven cancer cell-lines, which was possible through introduction of cell-line specific parameters for receptor expression, i.e., the initial concentration of the receptor model species. This comprehensive model was later employed to predict proliferation behavior in 58 cancer cell-lines as well as—in combination with decision tree classification—in humans from actual patient data.

Approach

It was reported that it is possible to fit the experimental data of the individual cell-lines with the same model structure assuming equal kinetic rates and cell-line specific receptor abundance based on the standard LASSO approach ([19], Eq 4). Because this method is in general not able to identify subgroups of cell-lines that share the same receptor expression when they do not include the reference cell-line, we investigated the clusters resulting from symmetric penalization of the fold-change parameters. Following the work of [28], we added regularization terms to the objective function for parameters that relate the receptor expression of the cell-lines BxPc3, A431, BT-20, ACHN, ADRr and IGROV-1 to that of H322M. We applied our approach with the original experimental data and 30 regularization strengths ranging between 1 and 10⁴ and identified clusters of cell-lines that share identical receptor expression. We incorporated these clusters into our model and utilized the likelihood ratio test to select the parsimonious model.

Results

Introducing regularization to the optimization function promotes model sparsity, i.e., a low number of cell-line specific parameters (S2A Fig). These constraints result in worse objective function values also in the unregularized setting when compared to the unconstrained model, increasing the likelihood ratio (S2B Fig). We found the optimal value of the regularization strength to be λ = 10^1.75, where the constrained model is still compatible with the full model of [28]. At this value of the regularization strength, we find a number of clusters that share the same receptor expression among the seven cell-lines (S2C Fig). When scanning through different values of λ, new optima can arise can lead to different combinations of fold-change parameters and differences of such equal to zero. Such behavior can lead to a drop in the test statistic when increasing the value of λ.

Values for the receptor surface levels for the different cell-lines calculated from data of the CCLE database were reported in [28], which can be compared to the estimated model parameters. We additionally compare their relative amounts to the receptor values from their model that were estimated with penalization of fold-change differences (Fig 4). For the EGF receptor, we find a cluster of the three cell-lines with the lowest EGFR surface level, IGROV-1, ADRr and H322M. For IGF-1R and Met receptors, clustering also resembles the surface receptor value ordering. However, for the ErbB2 and ErbB3 receptors, the clustering of cell-lines resulting from regularized optimization is more challenging to interpret: In ErbB2, clustering seems unrelated to the receptor surface levels, while in ErbB3, cell-lines with similar receptor surface levels are clustered together, but the model estimates for these clusters do not reflect the ordering in the CCLE-based data. However, the latter two effects also occur without regularization in the best-fit reported by [28], which indicates that these results are not related to the symmetric penalization of fold-change differences, but rather a result from the model structure or are artifacts in the CCLE data. Through the identification of clusters of cell-lines that share identical receptor surface levels, we demonstrated how the method we propose provides additional insights compared to the classical LASSO approach.

Fig 4 — Comparison of receptor surface level values from [28]. for EGF, ErbB2, ErbB3, IGF-1 and Met receptors for the different cell-lines H322M, BxPc3, A431, BT-20, ACHN, ADRr and IGROV-1 with the estimated initial values from the mathematical model, as well as the results from regularization with symmetric penalization of fold-change differences.

V. Conclusions

Regularization is a valuable method to reduce model complexity, e.g., when dealing with multiple cell types. In ordinary differential equation models, it is often applied to infer candidates for parsimonious models from data by clustering similar cell types on the level of individual parameters. Biologically, this can be related to cell types that have identical mutations. Approaches based on LASSO regularization with L₁ and L_q penalization for two cell types are readily available, but the question of how to handle n>2 cell types without biasing the result through the specification of a reference cell type remained open. The grouped LASSO was proposed to address this challenge when predefined groups of parameters can be specified, for example based on prior knowledge. This translates to one parameter being either identical in or specific to all analyzed cell types, without being able to search for subsets that share a certain parameter or mutation.

We proposed an extension of the LASSO approach in differential equation modeling motivated by the clustered LASSO [16] with an L_0.8 penalization, which is the regularization with symmetric penalization of fold-change differences. We highlighted that this method allows treating an arbitrary number of cell types without the result being dependent on the arbitrary choice of a reference cell type or introducing any additional parameter. We argued that employing an L_0.8 instead of the usual L₁ norm is more beneficial, because it provides a gradient that leads to more efficient clustering. We discussed how optimization must be adapted in the presence of the new regularization function in terms of the additional residuals and sensitivities. Further, we adapted the optimality criterion and optimizer step truncation accordingly, as well as the calculation of degrees of freedom that is required to perform statistical tests. Concerning computational cost, the computation time required for optimizing a system comprising multiple cell types is multiplied by the number of cell types, which holds true also in a non-regularized setting and is independent from the clustered LASSO approach. A general performance analysis of ODE models can be found in [29].

We demonstrated the advantages of regularization with symmetric penalization of fold-change differences compared to the standard LASSO approach with the L_0.8 penalization when using more than two cell types in a simplistic example by visualizing the optimization end points of the objective function landscape in parameter space under the influence of increasingly strong regularization. We evaluated the effect on performance of symmetric penalization of fold-change differences in a simulation study: It revealed how our method extends the usefulness of regularization approaches compared to the standard and grouped LASSO, by enabling clustering of any number of cell types that share certain mutations.

We applied our method to a published model of the Epo receptor dynamics upon ligand binding by Becker et al. with realistically simulated data. This confirmed the surplus value of symmetric penalization of fold-change differences compared to standard and grouped LASSO also in a realistic setting. We also revisited the research question of [19] and performed LASSO regularization with symmetric penalization of fold-change parameters on the exact same problem including biological data. We were able to identify clusters of cell-lines that share the same receptor expression, which were confirmed by CCLE-based receptor surface level data where such a comparison is applicable.

We would like to emphasize that, as in all modeling approaches in biology, interpretations of results can be challenging. Moreover, due to data sparsity and limitations of data quality, false positive and false negative results can be obtained, as is also the case for the proposed method. Resulting models should always be seen as a useful approach to understand biological mechanisms and resulting model predictions should subsequently be validated experimentally.

In summary, we illustrated how the proposed method will advance the analysis of multiple cell types. Since we implemented our method in the freely available open-source modeling environment Data2Dynamics, it can be easily applied to a broad range of modeling problems, especially in but not limited to the context of systems biology.

Supporting information

S1 Text. Comparison of common values for sensitivities.

(PDF)

Click here for additional data file.^{(53.4KB, pdf)}

S1 Fig

Objective function landscapes with the regularized best-fit parameter vector (red dot) for different regularization strengths λ for symmetric penalization of fold-change parameters with the L₁ norm. The square bracket indicates a significant decrease in likelihood in terms of a likelihood ratio test.

(PDF)

Click here for additional data file.^{(87.3KB, pdf)}

S2 Fig

(A) The number of cell type specific parameters dependent on the regularization strength in the model of [28] with symmetric penalization of fold-change differences. (B] The likelihood ratio (blue line) between the constrained and unconstrained model increases with the regularization strength. The largest value of λ for which the likelihood ratio is below the statistical threshold (red line) represents the parsimonious model (black line). The drop in the test statistic can be accounted to emergence of a new optimum. (C) Fold change parameters and differences of them with their values dependent on the regularization strength λ. Boxes denote the regions where a parameter is not constrained to zero, while the dashed line indicates the regularization strength corresponding to the parsimonious model.

(PDF)

Click here for additional data file.^{(2.9MB, pdf)}

S3 Fig. Simulation study (Fig 3C) with an ABC model to assess the performance of regularization with symmetric penalization of fold-change differences when using (A) the maximum of sensitivities corresponding to the same residual as a common value, and (B), the mean of sensitivities as a common value.

(PDF)

Click here for additional data file.^{(80.5KB, pdf)}

Acknowledgments

We thank Daniel Lill for his suggestions during the conceptualization phase of this study, and Marcel Schilling for his input on the biological relevance of this work.

Data Availability

The method of regularization with symmetric penalization of fold-change parameters is implemented in the open-source Matlab toolbox Data2Dynamics and is accessible via GitHub https://github.com/Data2Dynamics/d2d. A demonstration script can be found under arFramework3/Examples/Becker_Science_2010/Setup_Regularization.m.

Funding Statement

This work was funded by the German Research Foundation (DFG) under Germany’s Excellence Strategy (CIBSS – EXC-2189 – Project ID 390929984; A.H.), the SFB 1381 (Project ID 403222702, A.H.), and the TRR 179 (Project ID 272983813, M.R.). We acknowledge support by the state of Baden-Württemberg through bwHPC and the German Research Foundation through grant INST 35/1134-1 FUGG. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1.Legewie S, Blüthgen N, Herzel H. Mathematical Modeling Identifies Inhibitors of Apoptosis as Mediators of Positive Feedback and Bistability. Sander C, editor. PLoS Comput Biol. 2006. Sep 15;2(9):e120. doi: 10.1371/journal.pcbi.0020120 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Oppelt A, Kaschek D, Huppelschoten S, Sison-Young R, Zhang F, Buck-Wiese M, et al. Model-based identification of TNFα-induced IKKβ-mediated and IκBα-mediated regulation of NFκB signal transduction as a tool to quantify the impact of drug-induced liver injury compounds. Npj Syst Biol Appl. 2018. Dec;4(1):23. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Zitzmann C, Schmid B, Ruggieri A, Perelson AS, Binder M, Bartenschlager R, et al. A Coupled Mathematical Model of the Intracellular Replication of Dengue Virus and the Host Cell Immune Response to Infection. Front Microbiol. 2020. Apr 29;11:725. doi: 10.3389/fmicb.2020.00725 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Bachmann J, Raue A, Schilling M, Böhm ME, Kreutz C, Kaschek D, et al. Division of labor by dual feedback regulators controls JAK2/STAT5 signaling over broad ligand range. Mol Syst Biol. 2011. Jul 19;7:516. doi: 10.1038/msb.2011.50 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.D’Alessandro LA, Klingmüller U, Schilling M. Deciphering signal transduction networks in the liver by mechanistic mathematical modelling. Biochem J. 2022. Jun 30;479(12):1361–74. doi: 10.1042/BCJ20210548 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Gottschalk RA, Martins AJ, Angermann BR, Dutta B, Ng CE, Uderhardt S, et al. Distinct NF-κB and MAPK Activation Thresholds Uncouple Steady-State Microbe Sensing from Anti-pathogen Inflammatory Responses. Cell Syst. 2016. Jun;2(6):378–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Luecke S, Sheu KM, Hoffmann A. Stimulus-specific responses in innate immunity: Multilayered regulatory circuits. Immunity. 2021. Sep;54(9):1915–32. doi: 10.1016/j.immuni.2021.08.018 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Imoto H, Yamashiro S, Okada M. A text-based computational framework for patient-specific modeling for classification of cancers. iScience. 2022. Mar;25(3):103944. doi: 10.1016/j.isci.2022.103944 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Merkle R, Steiert B, Salopiata F, Depner S, Raue A, Iwamoto N, et al. Identification of Cell Type-Specific Differences in Erythropoietin Receptor Signaling in Primary Erythroid and Lung Cancer Cells. PLoS Comput Biol. 2016;12(8):e1005049. doi: 10.1371/journal.pcbi.1005049 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Pfeifer D, Pantic M, Skatulla I, Rawluk J, Kreutz C, Martens UM, et al. Genome-wide analysis of DNA copy number changes and LOH in CLL using high-density SNP arrays. Blood. 2007;109(3):1202–10. doi: 10.1182/blood-2006-07-034256 [DOI] [PubMed] [Google Scholar]
11.Box GEP, Hill WJ. Discrimination among Mechanistic Models. Technometrics. 1967;9(1):57–71. [Google Scholar]
12.Steiert B, Timmer J, Kreutz C. L1 regularization facilitates detection of cell type-specific parameters in dynamical systems. Bioinformatics. 2016. Sep 1;32(17):i718–26. doi: 10.1093/bioinformatics/btw461 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Tibshirani R. Regression Shrinkage and Selection via the Lasso. J R Stat Soc Ser B Methodol. 1996;58(1):267–88. [Google Scholar]
14.Schmidt M, Fung G, Rosaless R. Optimization Methods for ℓ1-Regularization. UBC Tech Rep TR-2009-19. 2009; [Google Scholar]
15.Dolejsch P, Hass H, Timmer J. Extensions of ℓ1 regularization increase detection specificity for cell-type specific parameters in dynamic models. BMC Bioinformatics. 2019. Dec;20(1):395. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.She Y. Sparse regression with exact clustering. Electron J Stat [Internet]. 2010. Jan 1 [cited 2022 Jul 6];4. Available from: https://projecteuclid.org/journals/electronic-journal-of-statistics/volume-4/issue-none/Sparse-regression-with-exact-clustering/10.1214/10-EJS578.full [Google Scholar]
17.Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Stat Methodol. 2006. Feb;68(1):49–67. [Google Scholar]
18.Korkut A, Zaidi S, Kanchi RS, Rao S, Gough NR, Schultz A, et al. A Pan-Cancer Analysis Reveals High-Frequency Genetic Alterations in Mediators of Signaling by the TGF-β Superfamily. Cell Syst. 2018. Oct;7(4):422–437.e7. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Hass H, Masson K, Wohlgemuth S, Paragas V, Allen JE, Sevecka M, et al. Predicting ligand-dependent tumors from multi-dimensional signaling features. Npj Syst Biol Appl. 2017. Dec;3(1):27. doi: 10.1038/s41540-017-0030-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.De Landtsheer S, Lucarelli P, Sauter T. Using Regularization to Infer Cell Line Specificity in Logical Network Models of Signaling Pathways. Front Physiol. 2018. May 22;9:550. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Kemmer S, Berdiel-Acer M, Reinz E, Sonntag J, Tarade N, Bernhardt S, et al. Disentangling ERBB Signaling in Breast Cancer Subtypes—A Model-Based Analysis. Cancers. 2022. May 12;14(10):2379. doi: 10.3390/cancers14102379 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Kok F, Rosenblatt M, Teusel M, Nizharadze T, Gonçalves Magalhães V, Dächert C, et al. Disentangling molecular mechanisms regulating sensitization of interferon alpha signal transduction. Mol Syst Biol. 2020. Jul;16(7):e8955. doi: 10.15252/msb.20198955 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Lucarelli P, Schilling M, Kreutz C, Vlasov A, Boehm ME, Iwamoto N, et al. Resolving the Combinatorial Complexity of Smad Protein Complex Formation and Its Link to Gene Expression. Cell Syst. 2018. Jan;6(1):75–89.e11. doi: 10.1016/j.cels.2017.11.010 [DOI] [PubMed] [Google Scholar]
24.Lao-Martil D, Schmitz J, Teusink B, van Riel N. Elucidating yeast glycolytic dynamics at steady state grwoth and glucose pulses through kinetic metabolic modeling. Metabolic Engineering. 2023. May; 77:127–142.26. Wilks SS. The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses. Ann Math Stat. 1938 Mar;9(1):60–2. [DOI] [PubMed] [Google Scholar]
25.Wilks S. The large-sample distribution of the likelihood ratio for testing composite hypotheses. The annals of mathematical statistics. 1928; 9(1). [Google Scholar]
26.Raue A, Steiert B, Schelker M, Kreutz C, Maiwald T, Hass H, et al. Data2Dynamics: a modeling environment tailored to parameter estimation in dynamical systems. Bioinformatics. 2015. Nov 1;31(21):3558–60. doi: 10.1093/bioinformatics/btv405 [DOI] [PubMed] [Google Scholar]
27.Becker V, Schilling M, Bachmann J, Baumann U, Raue A, Maiwald T, et al. Covering a Broad Dynamic Range: Information Processing at the Erythropoietin Receptor. Science. 2010. Jun 11;328(5984):1404–8. doi: 10.1126/science.1184913 [DOI] [PubMed] [Google Scholar]
28.Hass H, Loos C, Raimúndez-Álvarez E, Timmer J, Hasenauer J, Kreutz C. Benchmark problems for dynamic modeling of intracellular processes. Stegle O, editor. Bioinformatics. 2019. Sep 1;35(17):3073–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Raue A, Schilling M, Bachmann J, Matteson A, Schelker M, Kaschek D, et al. Lessons learned from quantitative dynamical modeling in systems biology. PLOS One. 2013. Sep 30; 8(12):e74335 doi: 10.1371/journal.pone.0074335 [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010867.r001

Decision Letter 0

Kiran R Patil

3 Apr 2023

Dear Mr. Hauber,

Thank you very much for submitting your manuscript "Uncovering specific mechanisms across cell types in dynamical models" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Kiran Patil

Section Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: See attached file

Reviewer #2: The authors of this paper suggest utilizing clustered LASSO regularization in dynamical models described by ordinary differential equations. The motivation for such regularization is to enable the identification of mechanisms specific to certain cell types, whatever the number of cell types, without having a bias introduced by selecting a particular reference cell type. It seems it is a follow-up of previous research conducted by one of the co-authors, as mentioned in references [12] and [15]. To showcase the proposed regularization, the authors present three case studies.

The text is well-structured, technically correct, and relatively clear. Below are some remarks that might improve the manuscript.

Comments:

1. From Figure 1b, it is clear that when Lambda is sufficiently large, the parameter estimation process will strongly favor maximizing the number of parameters with identical values due to symmetric penalization. Although this is the main reason for implementing this form of regularization, it can lead to the identification of multiple false positives, such as several mutations that are identified to be common but are not. This is especially probable in underdetermined problems, where parameters can vary widely due to the scarcity of observations, as noted by Gutenkunst et al (PloS Comp Bio, 2007). Since the authors focus on biological systems as examples, which generally lack sufficient observations compared to the number of parameters that need to be identified, this issue should be explicitly mentioned and discussed. Can we avoid this problem at all?

2. Related to my previous remark, in the presented toy example, it is possible for cell types 2 and 3 to have distinct mutations, yet the proposed algorithm might identify them as common. Moreover, since clustered LASSO does not yield the actual parameter values (instead, it tends to shrink them toward zero, like standard LASSO), it is unclear why one should apply this regularization in such a scenario. In general, the authors should provide clear guidance to readers on when it is appropriate and when it is not advisable to use their regularization method.

3. Given that linear correlations between parameters are typical in various systems, I would like to know if this regularization method can be extended to address such cases. I assume that the correlation coefficients would probably appear as the optimization variables in the penalty term. Can you comment on this?

4. The explanation in lines 310-320 and Figure 2 could be clearer regarding the toy model. It is hard to distinguish what the authors consider as reality and what is meant to provide “realistic” data for the example. Moreover, in the text, the species is denoted with x, in Figure as A. The value for parameter p in the text is logp=-1.5 whereas in Figure 2b we find -2 (I was trying to deduce based on the graph in Figure 2c what was used). Next, the values in Figure 2b are not p (as written at the top of the table), these should be log p. Similarly in Figure 2A, you have a mix of notations. With the notation presented earlier in the manuscript, I would not say that you can write p1r1^(2) and p1r1(3).

5. Regarding the size of systems this kind of regularization can be used on, how does this method scale computationally when the number of cell types increases and the number of parameters? Please provide the computational time estimates.

6. Since this regularization is proposed in the context of nonlinear optimization, how can you guarantee you found a satisfying result? Furthermore, can you enumerate all alternative solutions of clustered parameters? For example, in the study depicted in Figure 4, there are cell line groups that have the same parameter values. Will these groups change if you converge on a different solution?

7. Lines 240-250, to avoid that r_i^(j)-r_i^(k)=0, r_i^(j) <>0 occurs, would it work using a kind of forgetting factor?

8. The notation throughout the paper is confusing. For example, lines 120-126, one understands that in r_i^(j) the subscript i denotes the i-th parameters, whereas the superscript j denotes the cell type. But then, lines 199-200, we see the same subscript i denoting “the l-th and the o-th parameter”. Furthermore, lines 210-212, what the subscript l-n denotes. In general, for most superscripts and subscripts, their range is missing.

9. Line 347, there is a floating sentence fragment.

Reviewer #3: Hauber, Rosenblatt and Timmer present the development of LASSO regularization in context of estimating parameters of dynamic models of cellular pathways. Their method is a generalization of the LASSO regularization technique for n > 2 conditions, which enables reference-free distinction of samples, based on their estimated parameter values. Instead of penalizing changes in parameters of a perturbed cell state, compared to a predefined reference parameter set, their grouped LASSO approach penalizes parameter differences across all possible pairs of samples. Under the assumption that all conditions can be distinguished only by changes in parameter values with respect to each other, and not by structural model changes, their method provides a minimal parameter change to describe different conditions. Furthermore, the authors propose a modification to MATLAB’s lsqnonlin optimizer to account for the additional penalty terms. In general, this approach is relevant for improving cell state distinction through dynamic modelling, as it reduces bias based on experimental group selection, and enables robust identification of multiple subgroups within predefined experimental groups.

The authors provide three examples, illustrating the application of their proposed regularization approach, ranging from fully simulated to completely experimental, where an experiment by Hass et al. (2017) is repeated, incorporating the proposed grouped LASSO approach. A major concern, however, relates to the method of hyperparameter tuning. In the proposed approach, selection of the hyperparameter occurs by performing a likelihood ratio test on the fitted likelihood values. While the approach taken seems justified in general, the likelihood ratio test should instead be performed on likelihood values calculated on samples unseen during the parameter estimation procedure. This prevents biasing the results towards the specific samples included in the data. This reviewer recommends including a cross-validation approach to strengthen the hyperparameter selection.

Additionally, the following minor points should be addressed and possibly included:

1) In the explanation on parameter estimation in dynamical systems, the authors mention that parameters are inferred from normally distributed data. It seems that the authors may have meant normally distributed error in the data, as the measurements that are used to fit the dynamic model are not at all normally distributed.

2) In the section Adaptations to the optimization algorithm, two parameters are introduced with subscripts l and o in the first subsection, while in the subsequent subsections, these parameters seem to be referred to using subscripts l and n. Please improve the consistency in the notation.

3) In the formulae for the residual and the sensitivity of the residual in the same section as in the previous comments, the norm is divided by 1/lambda, which could be made more clear by multiplying the norm with lambda instead.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Attachment

Submitted filename: ReviewPlosCompBio202303.docx

Click here for additional data file.^{(16.1KB, docx)}

Attachment

Submitted filename: ReviewPlosCompBio202303.docx

Click here for additional data file.^{(16.2KB, docx)}

PLoS Comput Biol. 2023 Sep 13;19(9):e1010867. doi: 10.1371/journal.pcbi.1010867.r002

Author response to Decision Letter 0

30 Jun 2023

Attachment

Submitted filename: answer.pdf

Click here for additional data file.^{(180.9KB, pdf)}

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010867.r003

Decision Letter 1

Kiran R Patil

14 Aug 2023

Dear Dr. Hauber,

We are pleased to inform you that your manuscript 'Uncovering specific mechanisms across cell types in dynamical models' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology.

Best regards,

Kiran R. Patil, Ph.D.

Section Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #4: This reviewer would like to initially thank the authors for taking his comments into account in this revised version. The revised version of the authors’ submission ‘Uncovering Specific Mechanisms across Cell Types in Dynamical Models’ improved upon the initial submission in readability and clarity of the presented work. In addition to the updated components, recommended in my previous review, I would like to explicitly express my appreciation for the addition on interpretation of results in biological models. In this submission, the authors present their method not as being able to solve these major challenges in biological modelling, but as a method that reduces bias in modelling (possibly unknown) subgroups in the data and improves the robustness of parameter estimates against small differences in measurement data between these groups. Concerning the submission overall, I have only one small remark:

1) The numbering in the references section is missing numbers 25 and 26 (while 25 is present in the version of the manuscript indicating the changes that were made). This might be a result of the addition and removal of references during revision.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

Reviewer #4: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #4: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010867.r004

Acceptance letter

Kiran R Patil

8 Sep 2023

PCOMPBIOL-D-23-00034R1

Uncovering specific mechanisms across cell types in dynamical models

Dear Dr Hauber,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Zsofia Freund

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Text. Comparison of common values for sensitivities.

(PDF)

Click here for additional data file.^{(53.4KB, pdf)}

S1 Fig

(PDF)

Click here for additional data file.^{(87.3KB, pdf)}

S2 Fig

(PDF)

Click here for additional data file.^{(2.9MB, pdf)}

(PDF)

Click here for additional data file.^{(80.5KB, pdf)}

Attachment

Submitted filename: ReviewPlosCompBio202303.docx

Click here for additional data file.^{(16.1KB, docx)}

Attachment

Submitted filename: ReviewPlosCompBio202303.docx

Click here for additional data file.^{(16.2KB, docx)}

Attachment

Submitted filename: answer.pdf

Click here for additional data file.^{(180.9KB, pdf)}

Data Availability Statement

[pcbi.1010867.ref001] 1.Legewie S, Blüthgen N, Herzel H. Mathematical Modeling Identifies Inhibitors of Apoptosis as Mediators of Positive Feedback and Bistability. Sander C, editor. PLoS Comput Biol. 2006. Sep 15;2(9):e120. doi: 10.1371/journal.pcbi.0020120 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010867.ref002] 2.Oppelt A, Kaschek D, Huppelschoten S, Sison-Young R, Zhang F, Buck-Wiese M, et al. Model-based identification of TNFα-induced IKKβ-mediated and IκBα-mediated regulation of NFκB signal transduction as a tool to quantify the impact of drug-induced liver injury compounds. Npj Syst Biol Appl. 2018. Dec;4(1):23. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010867.ref003] 3.Zitzmann C, Schmid B, Ruggieri A, Perelson AS, Binder M, Bartenschlager R, et al. A Coupled Mathematical Model of the Intracellular Replication of Dengue Virus and the Host Cell Immune Response to Infection. Front Microbiol. 2020. Apr 29;11:725. doi: 10.3389/fmicb.2020.00725 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010867.ref004] 4.Bachmann J, Raue A, Schilling M, Böhm ME, Kreutz C, Kaschek D, et al. Division of labor by dual feedback regulators controls JAK2/STAT5 signaling over broad ligand range. Mol Syst Biol. 2011. Jul 19;7:516. doi: 10.1038/msb.2011.50 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010867.ref005] 5.D’Alessandro LA, Klingmüller U, Schilling M. Deciphering signal transduction networks in the liver by mechanistic mathematical modelling. Biochem J. 2022. Jun 30;479(12):1361–74. doi: 10.1042/BCJ20210548 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010867.ref006] 6.Gottschalk RA, Martins AJ, Angermann BR, Dutta B, Ng CE, Uderhardt S, et al. Distinct NF-κB and MAPK Activation Thresholds Uncouple Steady-State Microbe Sensing from Anti-pathogen Inflammatory Responses. Cell Syst. 2016. Jun;2(6):378–90. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010867.ref007] 7.Luecke S, Sheu KM, Hoffmann A. Stimulus-specific responses in innate immunity: Multilayered regulatory circuits. Immunity. 2021. Sep;54(9):1915–32. doi: 10.1016/j.immuni.2021.08.018 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010867.ref008] 8.Imoto H, Yamashiro S, Okada M. A text-based computational framework for patient-specific modeling for classification of cancers. iScience. 2022. Mar;25(3):103944. doi: 10.1016/j.isci.2022.103944 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010867.ref009] 9.Merkle R, Steiert B, Salopiata F, Depner S, Raue A, Iwamoto N, et al. Identification of Cell Type-Specific Differences in Erythropoietin Receptor Signaling in Primary Erythroid and Lung Cancer Cells. PLoS Comput Biol. 2016;12(8):e1005049. doi: 10.1371/journal.pcbi.1005049 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010867.ref010] 10.Pfeifer D, Pantic M, Skatulla I, Rawluk J, Kreutz C, Martens UM, et al. Genome-wide analysis of DNA copy number changes and LOH in CLL using high-density SNP arrays. Blood. 2007;109(3):1202–10. doi: 10.1182/blood-2006-07-034256 [DOI] [PubMed] [Google Scholar]

[pcbi.1010867.ref011] 11.Box GEP, Hill WJ. Discrimination among Mechanistic Models. Technometrics. 1967;9(1):57–71. [Google Scholar]

[pcbi.1010867.ref012] 12.Steiert B, Timmer J, Kreutz C. L1 regularization facilitates detection of cell type-specific parameters in dynamical systems. Bioinformatics. 2016. Sep 1;32(17):i718–26. doi: 10.1093/bioinformatics/btw461 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010867.ref013] 13.Tibshirani R. Regression Shrinkage and Selection via the Lasso. J R Stat Soc Ser B Methodol. 1996;58(1):267–88. [Google Scholar]

[pcbi.1010867.ref014] 14.Schmidt M, Fung G, Rosaless R. Optimization Methods for ℓ1-Regularization. UBC Tech Rep TR-2009-19. 2009; [Google Scholar]

[pcbi.1010867.ref015] 15.Dolejsch P, Hass H, Timmer J. Extensions of ℓ1 regularization increase detection specificity for cell-type specific parameters in dynamic models. BMC Bioinformatics. 2019. Dec;20(1):395. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010867.ref016] 16.She Y. Sparse regression with exact clustering. Electron J Stat [Internet]. 2010. Jan 1 [cited 2022 Jul 6];4. Available from: https://projecteuclid.org/journals/electronic-journal-of-statistics/volume-4/issue-none/Sparse-regression-with-exact-clustering/10.1214/10-EJS578.full [Google Scholar]

[pcbi.1010867.ref017] 17.Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Stat Methodol. 2006. Feb;68(1):49–67. [Google Scholar]

[pcbi.1010867.ref018] 18.Korkut A, Zaidi S, Kanchi RS, Rao S, Gough NR, Schultz A, et al. A Pan-Cancer Analysis Reveals High-Frequency Genetic Alterations in Mediators of Signaling by the TGF-β Superfamily. Cell Syst. 2018. Oct;7(4):422–437.e7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010867.ref019] 19.Hass H, Masson K, Wohlgemuth S, Paragas V, Allen JE, Sevecka M, et al. Predicting ligand-dependent tumors from multi-dimensional signaling features. Npj Syst Biol Appl. 2017. Dec;3(1):27. doi: 10.1038/s41540-017-0030-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010867.ref020] 20.De Landtsheer S, Lucarelli P, Sauter T. Using Regularization to Infer Cell Line Specificity in Logical Network Models of Signaling Pathways. Front Physiol. 2018. May 22;9:550. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010867.ref021] 21.Kemmer S, Berdiel-Acer M, Reinz E, Sonntag J, Tarade N, Bernhardt S, et al. Disentangling ERBB Signaling in Breast Cancer Subtypes—A Model-Based Analysis. Cancers. 2022. May 12;14(10):2379. doi: 10.3390/cancers14102379 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010867.ref022] 22.Kok F, Rosenblatt M, Teusel M, Nizharadze T, Gonçalves Magalhães V, Dächert C, et al. Disentangling molecular mechanisms regulating sensitization of interferon alpha signal transduction. Mol Syst Biol. 2020. Jul;16(7):e8955. doi: 10.15252/msb.20198955 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010867.ref023] 23.Lucarelli P, Schilling M, Kreutz C, Vlasov A, Boehm ME, Iwamoto N, et al. Resolving the Combinatorial Complexity of Smad Protein Complex Formation and Its Link to Gene Expression. Cell Syst. 2018. Jan;6(1):75–89.e11. doi: 10.1016/j.cels.2017.11.010 [DOI] [PubMed] [Google Scholar]

[pcbi.1010867.ref024] 24.Lao-Martil D, Schmitz J, Teusink B, van Riel N. Elucidating yeast glycolytic dynamics at steady state grwoth and glucose pulses through kinetic metabolic modeling. Metabolic Engineering. 2023. May; 77:127–142.26. Wilks SS. The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses. Ann Math Stat. 1938 Mar;9(1):60–2. [DOI] [PubMed] [Google Scholar]

[pcbi.1010867.ref025] 25.Wilks S. The large-sample distribution of the likelihood ratio for testing composite hypotheses. The annals of mathematical statistics. 1928; 9(1). [Google Scholar]

[pcbi.1010867.ref026] 26.Raue A, Steiert B, Schelker M, Kreutz C, Maiwald T, Hass H, et al. Data2Dynamics: a modeling environment tailored to parameter estimation in dynamical systems. Bioinformatics. 2015. Nov 1;31(21):3558–60. doi: 10.1093/bioinformatics/btv405 [DOI] [PubMed] [Google Scholar]

[pcbi.1010867.ref027] 27.Becker V, Schilling M, Bachmann J, Baumann U, Raue A, Maiwald T, et al. Covering a Broad Dynamic Range: Information Processing at the Erythropoietin Receptor. Science. 2010. Jun 11;328(5984):1404–8. doi: 10.1126/science.1184913 [DOI] [PubMed] [Google Scholar]

[pcbi.1010867.ref028] 28.Hass H, Loos C, Raimúndez-Álvarez E, Timmer J, Hasenauer J, Kreutz C. Benchmark problems for dynamic modeling of intracellular processes. Stegle O, editor. Bioinformatics. 2019. Sep 1;35(17):3073–82. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1010867.ref029] 29.Raue A, Schilling M, Bachmann J, Matteson A, Schelker M, Kaschek D, et al. Lessons learned from quantitative dynamical modeling in systems biology. PLOS One. 2013. Sep 30; 8(12):e74335 doi: 10.1371/journal.pone.0074335 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Uncovering specific mechanisms across cell types in dynamical models

Adrian L Hauber

Marcus Rosenblatt

Jens Timmer

Roles

Abstract

Author summary

I. Introduction

Table 1. Examples of applicability of regularization methods for different parameter subgroup structures.

II. Problem statement

Parameter estimation in dynamical systems

Multiple cell types as related biological systems

LASSO regularization for the case of n = 2 cell types

Fig 1.

The challenge of n>2 cell types

III. Methods

Symmetric penalization of fold-changes

Adaptations to the optimization algorithm

Optimality criterion in presence of regularized fold-change differences

Implementation of the optimality criterion

Optimization step truncation

Selection of the parsimonious model

IV. Discussion

Application to a toy model with simulated data

Model description

Fig 2.

Approach

Results

Application to a biological model with simulated data

Model description

Fig 3.

Approach

Results

Application to experimental data

Model description

Approach

Results

Fig 4.

V. Conclusions

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Kiran R Patil

Roles

Author response to Decision Letter 0

Decision Letter 1

Kiran R Patil

Roles

Acceptance letter

Kiran R Patil

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases