Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2022 Jul 13;18(7):e1010322. doi: 10.1371/journal.pcbi.1010322

Fides: Reliable trust-region optimization for parameter estimation of ordinary differential equation models

Fabian Fröhlich 1,*, Peter K Sorger 1,*
Editor: Hugues Berry2
PMCID: PMC9312381  PMID: 35830470

Abstract

Ordinary differential equation (ODE) models are widely used to study biochemical reactions in cellular networks since they effectively describe the temporal evolution of these networks using mass action kinetics. The parameters of these models are rarely known a priori and must instead be estimated by calibration using experimental data. Optimization-based calibration of ODE models on is often challenging, even for low-dimensional problems. Multiple hypotheses have been advanced to explain why biochemical model calibration is challenging, including non-identifiability of model parameters, but there are few comprehensive studies that test these hypotheses, likely because tools for performing such studies are also lacking. Nonetheless, reliable model calibration is essential for uncertainty analysis, model comparison, and biological interpretation.

We implemented an established trust-region method as a modular Python framework (fides) to enable systematic comparison of different approaches to ODE model calibration involving a variety of Hessian approximation schemes. We evaluated fides on a recently developed corpus of biologically realistic benchmark problems for which real experimental data are available. Unexpectedly, we observed high variability in optimizer performance among different implementations of the same mathematical instructions (algorithms). Analysis of possible sources of poor optimizer performance identified limitations in the widely used Gauss-Newton, BFGS and SR1 Hessian approximation schemes. We addressed these drawbacks with a novel hybrid Hessian approximation scheme that enhances optimizer performance and outperforms existing hybrid approaches. When applied to the corpus of test models, we found that fides was on average more reliable and efficient than existing methods using a variety of criteria. We expect fides to be broadly useful for ODE constrained optimization problems in biochemical models and to be a foundation for future methods development.

Author summary

In cells, networks of biochemical reactions involving complex, time-dependent interactions among proteins and other biomolecules regulate diverse processes like signal transduction, cell division, and development. Precise understanding of the time-evolution of these networks requires the use of dynamical models, among which mass-action models based on ordinary differential equations are both powerful and tractable. However, for these models to capture the specifics of a particular cellular process, their parameters must be estimated by minimizing the difference between the simulation (of a dynamical variable such as a particular protein concentration) and experimental data (this is the process of model calibration). This is a difficult and computation-intensive process that has previously been tackled using a range of mathematical techniques whose strengths and weaknesses are not fully understood. In this manuscript, we describe a new software tool, fides, that makes rigorous comparison of calibration methods possible. Unexpectedly, we find that different software implementations of the same mathematical method vary in performance. Using fides, we analyze the causes of this variability, evaluate multiple improvements, and implement a set of generally useful methods and metrics for use in future modeling studies.

1 Introduction

Many cellular biochemical networks exhibit time-varying responses to external and internal stimuli. Modeling networks requires using dynamical models that capture key features of these networks at the level of individual bio-molecules but remain computationally tractable. Developing and testing these models requires time-resolved experimental data, but these datasets are usually severely limited, particularly for mammalian cells: only a subset of model species (e.g., proteins) are typically measured (observed), and these measurements are usually made at discrete timepoints. To partially compensate for the sparsity of measurements, the experimental system is typically observed under a range of conditions that differ in the strength of the stimulus and the presence or absence of inhibitory drugs and genetic mutations.

Given these challenges, mass-action biochemical systems have emerged as an effective means of modeling the temporal evolution of a wide range of cellular networks [1]. Mass action biochemistry is a continuum approximation (i.e., one in which a large number of well-mixed molecules are present in each reaction compartment) that can be modeled by ordinary differential equation (ODE) models. Although cells are not well-mixed systems, ODE modeling can be highly effective for describing biochemical processes in both eukaryotic and prokaryotic cells [2]. Few parameters in these models, e.g., the initial reactant concentrations and rate constants, are known a priori and must instead be estimated from the (often limited) data. Estimation is commonly formulated as an optimization problem, where the objective function describes the discrepancy between a given solution to the ODE and experimental data. Minimizing this discrepancy can be computationally demanding due to the numerical integration required when evaluating the objective function and its derivatives [3]. Moreover, optimization is complicated by the wide range of time scales and intrinsic non-identifiability of many biochemical models (a property related to their “sloppiness” [4]) and the structure of the experimental data.

Efficiently finding robust solutions to the optimization problem is essential for model analysis, including prediction of unseen conditions and attempts to understand the logic of the underlying biochemical system [5]. Optimized parameter values are often used to initialize model analysis, such as uncertainty quantification via the profile likelihood or sampling approaches [6, 7]. Similarly, parameter optimization is required when models are compared based on goodness of fit, using measures such as Akaike information criterion (AIC) or Bayesian information criterion (BIC), or when other complexity-penalizing methods are applied [8, 9].

In general, the optimization problem for ODE models is non-convex, resulting in few theoretical guarantees of convergence when numerical optimization is employed. It is therefore necessary to rely on empirical evidence to select appropriate optimization algorithms for any specific class of problems [3, 10]. With respect to optimization in general, parameter estimation for ODE-based biochemical models belongs to an uncommon class of problems having four characteristic properties: (i) the optimization problem is often ill-posed due to parameter non-identifiability; (ii) optimization is computationally intensive, involving tens to hundreds of estimated parameters, yet the problems do not qualify as “high-dimensional” problems in the broader optimization literature, since, for example, there is rarely concern that the Hessian cannot be stored in memory; (iii) computation time for numerically solving the optimization problem is dominated by evaluation of the objective function and its derivatives whereas computation time required for a proposed parameter update itself is negligible; (iv) since models are inexact and experimental data is noisy, the residual values between simulation and data may be much larger than zero, even at the global minimum of the optimization problem (such problems are commonly called non-zero residual problems). The existing benchmarks for general purpose optimization, such as the CUTE(r/st) [1113] set of benchmarks, do not cover models having these four characteristics. Thus, domain-specific benchmarks are required to select optimal optimization algorithms.

Only two collections of models and accompanying experimental data have been proposed as benchmark problems in the literature to date. Villaverde et al. [14] proposed a set of 6 published models covering metabolic, developmental and signaling models in different organisms, but only two problems include real experimental data. More recently, Hass et al. [15] proposed a set of 20 published models covering signal transduction, immunological regulation, and epigenetic effects in a variety of organisms, all with real experimental data. The models in Hass corpus are small to medium sized, making them computationally tractable, but they are biologically realistic and the basis of a wide variety of previously published biological discoveries. The data are also realistic in their inclusion of Western blots, flow cytometry, and immunofluorescence microscopy. As mentioned above, such data typically provide indirect measurements of a subset of molecular species. Moreover, measurements are noise-corrupted and limited in time resolution, necessitating the use of data from multiple experimental conditions and the introduction of parameter dependent observable functions. Unfortunately, this prohibits the application of more efficient calibration techniques, such as quasi-linearization methods [16, 17] that require direct observation of all model species. Both the structure of biochemical models and limitations in the data impose a non-identifiability that results in parameter optimization problems that are not well-posed in a mathematical sense, violating a crucial assumption of many general-purpose optimization algorithms. For these reasons, the Hass corpus et al. [15] represents a unique a powerful resource for the evaluation of optimization methods for biochemical models under realistic conditions of varying data-richness and parameter identifiability.

Trust-region methods initialized from hundreds to thousands of random initial parameter values (often referred to as “multi-start”) have performed well for a broad set of biochemical ODE models [15, 18]. Trust-region methods are versatile methods that do not make any assumption about underlying model and data structure except that the objective function must be sufficiently smooth. They use local (quadratic) approximations of the objective function to propose parameter updates and then iteratively refine the local neighborhood in which the local approximation is expected to adequately recapitulate the shape of the true objective function, i.e. the trust-region [19]. Popular implementations of trust-region methods are available in the MATLAB optimization toolbox and the scipy Python optimization module. However, for many problems encountered in biology, including low dimensional biochemical models with as few as 20 parameters, these optimizers do not consistently converge to parameter values that yield similar values for the objective function [15], strongly suggesting failure to reach a global optimum—or even a local optimum. For example, the benchmark study by Hass et al. performed optimization for a model based on work of Fujita et al. [20], which describes Epidermal Growth Factor (EGF)-mediated activation of the Protein Kinase B pathway (also known as the PI3K/AKT pathway). Hass et al. found that the difference in negative log-likelihood between the best and second best parameter values was >5, exceeding the statistical threshold for model rejection according to AIC and BIC criteria [21]. Model selection is challenging in these cases, because poor optimizer performance could easily lead to the erroneous rejection of a model if the starting point that yields the best optimization run was omitted.

More generally, “optimal” solutions of parameter values that yield inconsistent objective function values, i.e., values that do not cluster in a small set of distinct values (Fig 1A), can indicate either (i) that the optimization converged on a few of the many critical points (local minima, saddle points) (Fig 1B right) or (ii) that the optimization terminated before convergence to any (local) minimum was achieved (Fig 1B left) [22]. Many of the problems of interest in biochemistry and cell biology involve ODE models and datasets that have multiple local minima, which can be a result of curvature of the model manifold [23]. In these cases, repeated convergence of multiple optimization runs on a small set of similar objective function values may not represent a problem with the optimization approach itself, but rather arise from model non-identifiability. This setting contrasts with the situation where the objective function values are inconsistent, despite a large number of runs that converge on one or a set of minima (setting rigorous thresholds for what is considered “consistent” is a tricky problem in and of itself, which we revisit this later in the manuscript). In such a situation, it is unclear whether optimization is non-convergent or the objective function is very “rugged” [3] with many local minima, not all of which may have been identified.

Fig 1. Illustration of final objective function values consistency and possible objective function landscapes.

Fig 1

A: Waterfall plot with examples of consistent (blue) and inconsistent (red) final objective function values. B: Possible objective function landscapes that could explain the waterfall plots in A.

Non-convergent optimization can also result from noisy model simulations, in which lax integration tolerances result in inaccurate numerical evaluation of the value of the objective function and its gradient [24]. Inaccurate gradients often result in poor parameter update proposals, slowing the search in parameter space. Inaccurate objective function values can also result in incorrect rejection of parameter updates, erroneously suggesting convergence to a minimum and leading to premature termination of optimization. For example, Tönsing et al. [24] found that optimization runs that yielded similar, but inconsistent objective function values were often located in the neighborhood of the same local minima, suggesting that these runs had been prematurely terminated. The authors suggested that a nudged elastic band method, which aims to identify shortest connecting paths between optima, might be effective in improving consistency.

Premature termination can also arise from problems with the optimization method itself. Dauphin et al. [25] found that saddle points are prevalent in the objective functions of neural network models and optimization methods that do not account for directions of negative curvature may perform poorly in the vicinity of saddle points. However, neither the prevalence of saddle points nor their impact on premature optimizer termination has been investigated in the case of biochemical ODE models. Lastly, Transtrum et al. [23] suggested that the use of Gauss-Newton Hessian approximations might not work well for sloppy biochemical models. Sloppiness is encountered when the objective function Hessian has a broad eigenvalue spectrum, which results in parameter non-identifiability and an ill-posed optimization problem. Sloppiness is believed to be a universal property of biochemical models [4]. However, the geodesic acceleration proposed by Transtrum et al. [23] to address the limitations of Gauss-Newton Hessian approximation has not been widely adopted, likely due to the complexity of its implementation and the computational cost of determining directional second-order derivatives.

Overall, the results described above show that early optimizer termination is a recurrent issue with ODE-based biochemical models and that it has a variety of causes. However, a comprehensive evaluation of this issue on a set of relevant benchmark problems, as well as development and testing of methods to identify or resolve the underlying causes of optimization failures are missing. In principle, this could be addressed by adapting and then combining several optimization algorithms. For example, it might be possible to resolve issues with the Gauss-Newton Hessian approximation by using alternative approximation schemes, such as the Broyden-Fletcher-Goldfarb-Shanno (BFGS) [2629] scheme. Issues with saddle points could be resolved by employing symmetric rank-one (SR1) [30] approximations that account for negative curvature directions. However, many optimization algorithms were written decades ago and are difficult for practitioners familiar with contemporary programming languages such as Python to customize or extend. Many existing methods lack reporting functions that provide user with statistics about individual optimization traces. These limitations make it difficult to diagnose problems with optimization and to resolve them with algorithmic improvements.

To tackle these and other challenges associated with ODE model optimization, this paper re-implements a standard trust-region algorithm in Python and uses it to study a range of hypotheses about the causes and potential solutions for poor optimizer performance. We find that the use of an inaccurate Hessian approximation is a major contributor to poor optimization performance and therefore propose a novel hybrid Hessian approximation scheme. We demonstrate that this scheme outperforms existing approaches on a the best available corpus of benchmark biochemical network problems.

2 Materials and methods

For the purpose of this study, we considered four different optimizers that all implement the interior-trust-region algorithm proposed by Coleman and Li [31]: fmincon, referring to the MATLAB function of the same name and with trust-region-reflective as algorithm and ldl-factorize as subproblem algorithm, lsqnonlin, referring to the MATLAB function of the same name, ls_trf, referring to the scipy function least_squares with trf algorithm, and fides, the new implementation developed in the current manuscript. Below we describe algorithmic and implementation details of fides (a summary is provided in Table 1), and of the benchmark problems we used to evaluate these algorithms.

Table 1. Feature overview for different trust-region optimization implementations.

The non least-squares column indicates whether the method is applicable to non least-squares problems. The free column indicates whether the implementation is freely available or proprietary software.

Optimizer Subspace Non least-squares BFGS/SR1 Programming Language free
lsqnonlin S2D MATLAB
fmincon S2D MATLAB
ls_trf Rnθ , S2D Python
fides Rnθ , S2D Python

2.1 Model formulation

When applied to a biochemical system, an ODE model describes the temporal evolution of abundances of nx different molecular species xi. The temporal evolution of x is determined by the vector field f and the initial condition x0:

x˙=f(t,x,θ),x(t0)=x0(θ). (1)

Both f and x0 depend on the unknown parameters θΘRnθ such as catalytic rates or binding affinities. Restricting optimization to the parameter domain Θ can constrain the parameter search space to values that are realistic based on physicochemical theory and helps prevent numerical integration failures associated with extreme parameter values. For most problems, Θ is the tensor product of scalar search intervals (li, ui) with lower and upper bounds li < ui that satisfy li,uiR{-,} for every parameter θi.

Experiments usually provide information about observables y which depend on abundances x and parameters θ. A direct measurement of x is usually not possible. Thus, the dependence of observables on abundances and parameters is described by

y(t,θ)=h(x(t,θ),θ). (2)

Implementation in this study: All methods described here use CVODES from the SUNDIALS suite [32] for numerical integration of model equations. CVODES is a multi-step implicit solver for stiff- and non-stiff ODE initial value problems.

2.2 Optimization problem

To generate models useful in the study of actual biological systems, model parameters θ must be inferred from experimental data, which are typically incomplete and subject to measurement noise. A common assumption is that the measurement noise for nt time-points tj and ny observables yi is additive, independent and normally distributed for all time-points:

y¯ij=yi(tj,θ)+ϵij,ϵijidN(0,σij2(θ)). (3)

Thus, model parameters can be inferred from experimental data by maximizing the likelihood, yielding a maximum likelihood estimate (MLE). However, the evaluation of the likelihood function involves the computation of several products of large terms, which can be numerically unstable. Thus, the negative log-likelihood

J(θ)=12i=1nyj=1ntlog(2πσij2(θ))+(y¯ij-yi(tj,θ)σij(θ))2 (4)

is typically used as objective function that is minimized. As the logarithm is a strictly monotonically increasing function, the minimization of J(θ) is equivalent to the maximization of the likelihood. Therefore, the corresponding minimization problem

θ*=argminθΘJ(θ), (5)

will infer the MLE parameters. If the noise variance σij2 does not depend on the parameters θ, the objective function (4) has a weighted least-squares formulation. As we discuss later, properties of a least-squares formulation can be exploited in specialized optimization methods. Optimizers that do not require least-squares structure can also work with other noise models [33].

Implementation in this study: For the MATLAB optimizers fmincon and lsqnonlin, the objective function and its derivatives were evaluated using data2dynamics [34] (commit b1e6acd), which was also used in the study by Hass et al. [15]. For the Python optimizers ls_trf and fides, the objective function and its derivates were evaluated using AMICI [35] (version 0.11.23) and pyPESTO (version 0.2.10).

2.3 Trust-region optimization

Trust-region methods minimize the objective function J by iteratively updating parameter values θk+1 = θk + Δθk according to the local minimum

Δθk=pk*=argminpmk(p)s.t.pΔk (6)

of an approximation mk to the objective function. Δk is the trust-region radius that restricts the norm of parameter updates. The optimization problem (6) is known as the trust-region subproblem. In most applications, a local, quadratic approximation

mk(p)=fk+gkTp+12pTBkp (7)

is used, where fk = J(θk) is the value, gk = ∇J(θk) is the gradient and Bk = ∇2J(θk) is the Hessian of the objective function evaluated at θk.

The trust-region radius Δk is updated in every iteration depending on the ratio ρk between the predicted decrease -mk(pk*) and actual decrease in objective function value ΔJ=J(θk)-J(θk+pk*) [19]. The step is accepted if ρk exceeds some threshold μ ≥ 0. When boundary constraints (on parameter values) are applied, the predicted decrease is augmented by an additional term that accounts for the parameter transformation (see Section 2.6) [31].

Implementation in this study: All optimizers evaluated in this study use μ = 0 as acceptance threshold. They all increase the trust-region radius Δk by a factor of 2 if the predicted change in objective function value is accurate (ρk > 0.75) and the local minimum is at the edge of the trust region (pk*>0.9Δk for fmincon, lsqnonlin and fides, pk*>0.95Δk for ls_trf). All optimizers decrease the trust-region radius if the predicted change in objective function value is inaccurate (ρ < 0.25), but fmincon, lsqnonlin and fides set the trust-region radius to min(Δk,pk*)4, while ls_trf sets it to Δk4.

When the predicted objective function decrease is negative (-mk(pk*)<0, i.e., an increase in value is predicted) fides and ls_trf set ρk to 0.0. Positive values for mk(pk*) may arise from the augmentation accounting for boundary constraints. Setting ρk to 0.0 prevents inadvertent increases to Δk or step acceptance when -mk(pk*) and ΔJ are both negative. However, in contrast to fides, ls_trf does not automatically reject respective step proposals and only does so if ΔJ < 0. fmincon and lsqnonlin only reject step proposals with ΔJ < 0, but do not take the sign of mk(pk*) into account when updating Δk.

When the objective function cannot be evaluated—for ODE models this is typically the result of an integration failure—all optimizers decrease the trust-region radius by setting it to min(Δk,pk*)20 (fmincon and lsqnonlin), min(Δk,pk*)4 (fides) or Δk4 (ls_trf). These subtly nuanced differences in the implementation are likely the result of incomplete specification of the algorithm in the original publication [31]. In particular, handling of mk(pk*)>0, which does not occur for standard trust-region methods, was not described and developers needed to independently work out custom solutions.

2.4 Hessian approximation

Constructing the local approximation (7) that defines the trust-region subproblem (6) requires the evaluation of the gradient gk and Hessian Bk of the objective function at the current parameter values θk. While the gradient gk can be efficiently and accurately computed using first order forward or adjoint sensitivity analysis [36], it is computationally more demanding to compute the Hessian Bk [37]. Therefore several approximation schemes have been proposed that approximate Bk using first order sensitivity analysis. In the following we will provide a brief description of approximation schemes considered in this study, an overview of schemes and their characteristics is provided in Table 2.

Table 2. Overview of properties of different Hessian approximation schemes.

BFGS is the Broyden-Fletcher-Goldfarb-Shannon algorithm. SR1 is the Symmetric Rank-one update. GN is the Gauss-Newton approximation. SSM is the Structured Secant Method. TSSM is the Totally Structured Secant Method. FX is the hybrid method proposed by Fletcher and Xu [45]. GNSBFGS is the Gauss-Newton Structured BFGS method. The construction column indicates whether pointwise evaluation is possible or whether iterative construction is necessary. The positive semi-definite column indicates whether the approximation preserves positive semi-definiteness given a positive semi-definite initialization.

Scheme Construction Positive Semi-Definite Convergence Requirement Requires Least-Squares
BFGS iterative
SR1 iterative
GN pointwise r(θ*)∥ = 0
SSM pointwise + iterative
TSSM pointwise + iterative
FX pointwise + iterative
GNSBFGS pointwise + iterative λmin(∇2J(θ*)) > 0

Gauss-newton approximation: The Gauss-Newton (GN) approximation Bk(GN) is based on a linearization of residuals rij

rij(θ)=y¯ij-yi(tj,θ)σij(θ)Bk(GN)=12i=1nyj=1ntrij(θk)rijT(θk), (8)

which yields a symmetric and positive semi-definite approximation to Bk and does not account for negative curvature. At the maximum likelihood estimate, B(GN) is equal to the negative empirical Fisher Information Matrix assuming σij does not depend on parameter values θ. For parameters dependent σ, the log(σ) term in (4) cannot be assumed to be constant, which results in a non least-squares optimization problem. For non least-squares problems, the adequateness and formulation of the GN approximation is not well established. Thus, Raue [38] proposed to transform the problem into a least-squares form by introducing additional error residuals rije and adding a corresponding correction to the Gauss-Newton approximation B(GN) from (8), yielding B(GNe):

rije(θ)=2log(σij(θ))+CBkl(GNe)=Bkl(GN)+σijθlσijθkσij(θ)2(2log(σij(θ))+C), (9)

where C is some arbitrary, but sufficiently large constant so that 2 log(σij(θ)) + C > 0. This condition ensures that residuals are real-valued and the approximation B(GNe) is positive semi-definite, but inherently makes optimization a non-zero residual problem. Adding the constant C to residuals adds a constant to the objective function value and, thus, neither influences its gradient and Hessian nor the location of its minima. However, C does enter the GNe approximation, with unclear implications. Instead, Stapor et al. [37] suggested that one ignores the second order derivative of the log(σ) term in (4), which corresponds to the limit limC→∞ B(GNe) = B(GN).

Iterative approximations: In contrast to the GN approximation, Broyden-Fletcher-Goldfarb-Shanno (BFGS) or Symmetric Rank-one (SR1) are iterative approximation schemes, in which the approximation in the next step

Bk+1=Bk+Δ(s,z,Bk)

is constructed based on the approximation in the current step Bk and some update Δ(x, s, Bk), where s = Δθk and z = gk+1gk.

The BFGS update scheme

Δ(BFGS)(s,z,M)=zzTzTs-(Ms)(Ms)TsTMs

guarantees a positive semi-definite approximation as long as a curvature condition zTs > 0 is satisfied and the initial approximation B0(BFGS) is positive semi-definite [19]. Thus, the update scheme is usually only applied with line search methods that guarantee satisfaction of the curvature condition by selecting the step length according to (strong) Wolfe conditions [19]. However, BFGS can also be used in trust-region methods by rejecting updates when the curvature condition is not satisfied, although this invalidates some theoretical convergence guarantees [19].

The SR1 update scheme

Δ(SR1)(s,z,M)=(z-Ms)(z-Ms)T(z-Ms)Ts

can also yield indefinite approximations, incorporating negative curvature information, and has no step requirements beyond ensuring that the denominator of the update is non-zero.

Structured secant approximations: The accuracy of the GN approximation depends on the magnitude of the residuals, since the approximation error is the sum of products of the residuals and the residual Hessians

Bk=Bk(GN)+12i=1nyj=1ntrij(θk)2rij(θk). (10)

For non-zero residual problems, in which there are indices i, j such that rij(θ*) ≫ 0, or in the presence of strong non-linearities, the second order term in (10) does not vanish and the GN approach is known to perform poorly; it can even diverge [39, 40]. This issue is addressed in structured secant methods [39, 41, 42], which combine the pointwise GN approximation with an iterative BFGS approximation Ak of the second order term. In the Structured Secant Method (SSM) [42], the matrix Ak is update using a BFGS scheme:

Bk+1(SSM)=Bk+1(GNe)+Ak+1Ak+1=Ak+Δ(BFGS)(s,z#,Bk+1(GNe)+Ak)z#=(r(θk+1)-r(θk))Tr(θk+1).

Similarly, the Totally Structured Secant Method (TSSM) [43] scales Ak with the residual norm, to mimic the product structure in the second order term in (10), and, accordingly, scales the update to Ak with the inverse of the residual norm:

Bk+1(TSSM)=Bk+1(GNe)+r(θk+1)Ak+1Ak+1=Ak+Δ(BFGS)(s,z,Bk+1(GNe))+r(θk+1)Ak)r(θk+1)z=Bk+1(GNe)s+z#r(θk+1)r(θk).

Despite the use of a BFGS updating scheme, the SSM and TSSM approximations do not preserve positive semi-definiteness, as the matrix Bk(GNe) is updated at every iteration without any additional safeguards. Structured secant approximations have been popularized by the NL2SOL toolbox [44], but are not featured in the standard optimization libraries in MATLAB or Python.

Hybrid schemes: Other hybrid schemes can dynamically switch between GN approximations and iterative updates when some metric indicates that the considered problem has non-zero residual structure. For example, Fletcher and Xu [45] proposed an approach to detect non-zero residuals by computing the normalized change in the residual norm and applying BFGS updates when the change is smaller than some tolerance ϵFX:

Bk+1(FX)={Bk(FX)+Δ(BFGS)(s,z,Bk(FX))ifr(θk)-r(θk+1)r(θk)<ϵFXBk+1(GNe)otherwise.

As Bk(GNe) is positive semi-definite and the BFGS updates preserve this property, the FX approximations are always positive semi-definite.

To address the issue of possibly indefinite approximations in the SSM and TSSM approaches, Zhou and Chen proposed a Gauss-Newton structured BFGS method (GNSBFGS) [40].

Bk+1(GNSBFGS)={Bk+1(GNe)+Ak+1ifzk+1Tsk+1sk+1Tsk+1>ϵGNSBFGSBk+1(GNe)+r(θk+1)IotherwiseAk+1={Ak+Δ(BFGS)(s,z,Ak)ifzk+1Tsk+1sk+1Tsk+1>ϵGNSBFGSAkotherwisez=z#r(θk+1)r(θk)

that combines the TSSM approach with the dynamic updating of the FX approach. As sum of two positive semi-definite matrices, Bk+1(GNSBFGS) is always positive semi-definite. The authors demonstrate that the term zk+1Tsk+1sk+1Tsk+1 plays a similar role as the r(θk)-r(θk+1)r(θk) term in the FX algorithm. However, they only prove convergence if the tolerance 2ϵGNSBFGS is smaller than the smallest eigenvalue λmin(∇2J(θ*)) of the Hessian at the optimal parameters and if ∇2J(θ*) is positive definite. This condition is not met for problems with singular Hessians, which are often observed for non-identifiable problems.

Implementation in this study: fmincon and lsqnonlin were only evaluated using the GNe approximation, as implemented in data2dynamics. ls_trf can only be applied using the GNe approximation. Fides was evaluated using BFGS and SR1 using respective native implementations in addition to GN and GNe, as implemented in AMICI. We used the default value of C = 50 for the computation of GNe in both data2dynamics and AMICI. We provide implementations for BFGS, SR1, SSM, TSSM, FX and GNSBFGS schemes in fides. FX, SSM, TSSM and GNSBFGS were applied using GNe, as they require a least-squares problem structure. Hyperparameters ϵFX = 0.2 and ϵGNSBFGS = 10−6 were picked based on recommended values in respective original publications.

2.5 Solving the trust-region subproblem

In principle, the trust-region subproblem (6) can be solved exactly [19]. Moré proposed an approach using eigenvalue decomposition of Bk [46]. Yet, Byrd et al. [47] noted the high computational cost of this approach and suggested an approximate solution by solving the trust-region problem over a two dimensional subspace S2D, spanned by gradient gk and Newton Bk-1gk search directions, instead of Rnθ. Yet, for objective functions requiring numerical integration of ODE models, the cost of eigenvalue decomposition is generally negligible for problems involving fewer than 103 parameters.

A crucial issue for the two-dimensional subspace approach are problems with indefinite (approximate) Hessians. For an indefinite Bk, the Newton search direction may not represent a direction of descent. This can be addressed by dampening Bk [19], but for boundary-constrained problems additional considerations arise and require the identification of a direction of strong negative curvature [31].

Implementation in this study: fmincon and lsqnonlin implement optimization only over S2D, where the Newton search direction is computed using preconditioned direct factorization. For preconditioning and direct factorization, fmincon and lsqnonlin employ Cholesky and QR decomposition respectively, which both implement dampening for numerically singular Bk. fides and ls_trf implement optimization over S2D (denoted by 2D in text and figures) and Rnθ (denoted by ND in text and figures). For ls_trf, we specified tr_solver = “lsmr” for optimization over S2D. To compute the Newton search direction, ls_trf and fides both use least-squares solvers, which is equivalent to using the Moore-Penrose pseudoinverse. fides uses the direct solver scipy.linalg.lstsq, with gelsd as LAPACK driver, while ls_trf uses the iterative, regularized scipy.sparse.linalg.lsmr solver. In fides, the negative curvature direction of indefinite Hessians is computed using the eigenvector to the largest negative eigenvalue (computed using scipy.linalg.eig).

2.6 Handling of boundary constraints

The trust-region region subproblem (6) does not account for boundary constraints, which means that θk + Δθk may not satisfy these constraints. For this reason, Coleman and Li [31] introduced a rescaling of the optimization variables depending on how close the current values are to the parameter boundary ∂Θ. This rescaling is realized through a vector

vi(θ)={θi-uiifJ(θ)i<0andui<θi-liifJ(θ)i0andli>--1ifJ(θ)i<0andui=1ifJ(θ)i0andli=-

which yields transformed optimization variables

θk^=Dk-1θk=diag(|v(θk))|12)-1θk

and a transformed Hessian

Bk^=DkBkDk+diag(gk)v(θk). (11)

As the second term in (11) is positive semi-definite, this handling of boundary constraints can also regularize the trust-region sub-problem, although this is not its primary intent.

Coleman and Li [31] also propose a stepback strategy in which solutions to (6) are reflected at the parameter boundary ∂Θ. Since this reflection defines a one-dimensional search space, the local minimum can be computed analytically at negligible computational cost. To ensure convergence, Δθk is then selected based on the lowest mk(p) value among (i) the reflection of p* at the parameter boundary (ii) the constrained Cauchy step, which is the minimizer of (7) along the gradient that is truncated at the paramtere boundary and (iii) p* truncated at the parameter boundary.

Implementation in this study: fmincon, lsqnonlin and ls_trf all implement rescaling, but only allow for a single reflection at the boundary [48]. In contrast, fides implements rescaling and also allows for a single or arbitrarily many reflections until the first local minimum along the reflected path is encountered.

2.7 Optimizer performance evaluation

To evaluate optimization performance, Hass et al. [15] computed the success count γ, which represents the number of “successful” optimization runs that reached a final objective function value sufficiently close (difference smaller than some threshold τ) to the lowest objective function value found by all methods, and divided that by the time to complete all optimization runs ttotal, a performance metric that was originally introduced by Villaverde et al. [49]. In this study, we replaced ttotal with the total number of gradient evaluations across all optimization runs ngrad for any specific optimization setting. The resulting performance metric ϕ = γ/ngrad ignores differences in computation time for gradients having different parameter values and prevents computer or simulator performance, node load and parallelization from influencing results. This provides a fairer evaluation of the algorithm or method itself and is particularly relevant when optimization is performed on computing clusters with heterogeneous nodes or when different number of threads are used to parallelize objective function evaluation, as it was the case in this study. Since ϕ ignores potentially higher computation times for step-size computation, we confirmed that step-size computation times were negligible as compared to numerical integration of model and sensitivity equations. For all trust-region optimizers we studied, the number of gradient evaluations was equal to the number of iterations, with the exception of ls_trf which only uses objective function evaluations when a proposed step is rejected. Thus, 1/ngrad is equal to the average number of iterations for the optimization to converge (divided by the number of optimization runs, which is 103 in all settings). We therefore refer to ν = 1/ngrad as the convergence rate. Performance ϕ is equal to the product of γ and ν.

To calculate γ, we used a threshold of τ = 2, which corresponds to the upper limit of the objective function value in cases in which a model cannot be rejected according to the AIC [50]. Similar to Hass et al. [15], we found that changing this convergence threshold did not have a significant impact on performance comparison, but provide analysis for values τ = 0.05 (threshold used by Hass et al., divided by two to account for difference in objective function scaling, Fig A in S1 Text) and τ = 5 (the threshold for rejection according to the AIC and BIC [21], Fig B in S1 Text) in the Supplementary Material.

2.8 Extension of boundary constraints

For some performance evaluations, we extended parameter boundaries. Even though initial points are usually uniformly sampled in Θ, we did not modify the locations of initial points when extending bounds. The Schwen (Table 3), problem required a different approach in which the bounds for the parameter fragments were not modified, as values outside the standard bounds were implausible.

Table 3. Summary of problem characteristics for benchmark examples, as characterized by Hass et al. [15].

Problem n θ n x nynt sloppy identifiable
Bachmann 113 36 541
Beer 72 4 27132
Boehm 9 8 48
Brannmark 22 9 43
Bruno 13 7 77
Crauste 12 5 21
Fiedler 19 6 72
Fujita 19 9 144
Isensee 46 25 687
Lucarelli 84 43 1755
Schwen 30 11 286
Weber 36 7 135
Zheng 46 15 60

As previously reported [15], extending boundaries can expose additional minima having globally lower objective function values. Thus, success count γ for optimization settings with normal boundaries were computed using the lowest objective function Jmin found among all settings excluding those with extended boundaries. γ for optimization settings using extended boundaries were computed using the minimum of Jmin and the lowest objective function value found for that particular setting.

2.9 Statistical analysis of optimizer traces

During the statistical analysis of optimizer traces, we quantified several numerical values derived from numerical approximation of matrix eigenvalues with limited accuracy (due, for example, to limitations in floating point precision). It was therefore necessary to account for this limitation in numerical accuracy:

Singular Hessians: To numerically assess matrix singularity of Hessian approximations, we checked whether the condition number, computed using the numpy function numpy.linalg.cond, was larger than the inverse of the floating point precision ϵ = 1/numpy.spacing(1).

Negative eigenvalues: To numerically assess whether a matrix has negative eigenvalues we computed the smallest (λmin) and largest (λmax) eigenvalues of the untransformed Hessian approximation Bk using numpy.linalg.eigvals and checked whether the smallest eigenvalue had a negative value that exceeded numerical noise λmin(Bk) < −ϵ ⋅ |λmax(Bk)|.

2.10 Implementation

fides is implemented as modular, object-oriented Python code. The subproblem, subproblem solvers, stepback strategies and Hessian approximations all have class-based implementations, making it easy to extend the code. Internally, fides uses SciPy [51] and NumPy [52] libraries to store vectors and matrices and perform linear algebra operations. To ensure access to state-of-the-art simulation and sensitivity analysis methods, we implemented an interface to fides in the parameter estimation toolbox pyPESTO, which uses AMICI [35] to perform simulation and sensitivity analysis via CVODES [32]. This approach also enabled import of biological parameter estimation problems specified in the PEtab [53] format.

2.11 Benchmark problems

To evaluate the performance of different optimizers, we use the benchmark problems (Table 3) introduced by Hass et al. [15]. As discussed in the introduction, these problems are excellent representatives of ODE-based biochemical models and include realistic experimental data for model calibration. We selected a subset of 13 out of the 20 models based on whether they can be encoded in the PEtab [53] format and imported in AMICI and pyPESTO. The exclusion of some models in Hass et al. [15] does not reflect a limitation of fides itself, as it supports optimization for any objective function that provides routines to compute its gradient. In summary, the Hass problem was excluded because it includes negative initial simulation time, which is not supported by PEtab; the Raia model because it involves state-dependent value for σ, which is unsupported by AMICI; Merkle and Sobotta because they are missing SBML implementations; Swameye because it includes spline functions that are not supported by SBML; Becker because it involves multiple models, which is not supported by pyPESTO; and Chen because forward sensitivity analysis is prohibitively computationally expensive for this model. There is no evidence that the excluded models are different in any systematic way from the models we do consider.

All benchmarks problems were previously published and included experimental data for model calibration as described by Table 3, which also provides a brief summary of numerical features of the various benchmarks. These problems cover a wide array of common model features such as preequilibration, log-transformation of observables as well as parameter dependent initial conditions, observable function and noise models. A more detailed description of the biochemical systems described by these models is available in the supplemental material of the study by Hass et al. [15].

2.12 Simulation and optimization settings

We encountered difficulties reproducing some of the results described by Hass et al. [15] and therefore repeated evaluations using the latest version of data2dynamics. We deactivated Bessel correction [15] and increased the function evaluation limit to match the iteration limit. Relative and absolute integration tolerances were set to 10−8. The maximum number of iterations for optimization was set to 105. Convergence criteria were limited to step sizes with a tolerance of 10−6, and ls_trf code was modified such that the convergence criteria matched the implementation in other optimizers. For all problems, we performed 103 optimizer runs. To initialize optimization, we used the initial parameter values provided by Hass et al. [15].

2.13 Parallelization and cluster infrastructure

Optimization was performed on the O2 Linux High Performance Compute Cluster at Harvard Medical School, a typical academic High Performance Compute resource running SLURM. O2 includes 390+ compute nodes and 12,000+ compute cores, with the majority of the compute nodes built on Intel architecture; all nodes run CentOS 7.7.1908 Linux with MATLAB R2017a and python 3.7.4. Optimization for each model and each optimizer setting was run as a separate job. For MATLAB optimizers, optimization was performed using a single core per job. For Python optimizers, execution was parallelized on up to 12 cores. Parallelization was always carried out on an individual node, avoiding inter-node communication.

We observed severe load balancing issues due to skewed computational cost across optimization runs for Bachmann, Isensee, Lucarelli and Beer models. To mitigate these issues, optimization for these models was parallelized over 3 threads using pyPESTOs MultiThreadEngine and simulation was parallelized over 4 threads using openMP multithreading in AMICI, resulting in a total parallelization over 12 threads. For the remaining models, optimization was parallelized using 10 threads without parallelization of simulations. Wall-time for each job was capped at 30 days (about 1 CPU year), which was only exceeded by the GNSBFGS and FX Hessian approximation schemes for the Lucarelli problem after 487 and 458 optimization runs respectively and also by the SSM Hessian approximation scheme for the Isensee problem after 973 optimization runs. Subsequent analysis was performed using partial results for those settings.

Compute times could not be reliably compared across methods and problems because it was necessary to use different degrees of parallelization for different problems and optimizations were run on different nodes with distinct processor models. We therefore evaluated performance based on the number of optimizer iterations. The number of iterations is independent of the degree of parallelization and processor models, but for any given implementation, the compute time will be proportional to the number of optimizer iterations.

3 Results

3.1 Validation and optimizer comparison

The implementation of trust-region optimization involves complex mathematical operations that can result in error-prone implementations. To validate the trust-region methods implemented in fides, we compared the performance of optimization using GN and GNe schemes against implementations of the same algorithm in MATLAB (fmincon, lsqnonlin) and Python (ls_trf). The subspace solvers and Hessian approximation used for each analysis are denoted by the notation implementation subspace/hessian.

We found that fides 2D/GN (blue) and fides 2D/GNe (orange) were the only methods that had non-zero performance (ϕ > 0) for all 13 benchmark problems (Fig 2A), with small performance differences between the two methods (0.72 to 1.12 fold difference, average 0.96). This established fides 2D/GN as good reference implementation. In what follows we therefore report the performance of other methods relative to fides 2D/GN (Fig 2B). The ls_trf method outperformed fides 2D/GN on three problems (1.54 to 22.4-fold change; Boehm, Crauste, Zheng; purple arrows), had similar performance on one problem (1.15-fold change; Fiedler), exhibited worse performance on four problems (0.02 to 0.55-fold change; Brannmark, Bruno, Lucarelli, Weber) and did not result in successful runs (zero performance ϕ = 0) for the remaining five problems (Bachmann, Beer, Fujita, Isensee, Schwen). Decomposing performance improvements ϕ into increases in convergence rate ν (Fig 2C) and success count γ (Fig 2D) revealed that increase in ϕ was primarily due to higher ν, which was observed for all but four problems (Bachmann, Bruno, Isensee, Schwen). However, in most cases, improvements in ν were canceled out by larger decreases in γ.

Fig 2. Comparison of MATLAB and Python optimizers.

Fig 2

Colors indicate optimizer setting and are the same in all panels. A: Performance comparison (ϕ), absolute values. B: Performance comparison (ϕ), values relative to fides 2D/GN. C: Increase in convergence count γ, values relative to fides 2D/GN. D: Increase in convergence rate ν, values relative to fides 2D/GN.

For fmincon (green) and lsqnonlin (red), ϕ was higher for one problem (2.34 to 2.94 -fold change; Fiedler; red/green arrow), similar for two problems (0.92 to 1.00 fold change, Boehm, Bruno) and worse for the remaining 10 problems (0.00 to 0.80 fold change, Bachmann, Brannmark, Fujita, Isensee, Lucarelli Schwen, Weber, Zheng), with zero performance on two problems (Beer, Crauste) (Fig 2B). Since we observed similar ϕ for fides 2D/GN (blue) and fides 2D/GNe (orange) on all problems, the differences in ϕ between fides 2D/GN and the other implementations are unlikely to reflect use of GNe as opposed to a GN scheme. Instead we surmised that the differences were due to discrepancies in implementation of the Newton direction (this would explain the similarity for the two identifiable problems (Boehm, Bruno, Table 3)).

Overall, these findings demonstrated that trust-region optimization implemented in fides was more than competitive with the MATLAB optimizers fmincon and lsqnonlin and the Python optimizer ls_trf, outperforming them on a majority of problems. Simultaneously, our results demonstrate a surprisingly high variability in optimizer performance among methods that implement fundamentally similar mathematical operations (i.e., the same algorithm). This variability may explain some of the conflicting findings in previous studies that assumed that differences in optimizer performance arose from the “mathematics” rather than the computational implementation [54, 55].

3.2 Parameter boundaries and stepback strategies

One of the few changes in implementation that we deliberately introduced into the fides code was to allow multiple reflections during stepback from parameter boundary conditions [31]. In contrast, ls_trf, lsqnonlin and fmincon only allow a single reflection [48]. The modular design and advanced logging capabilities of fides make it straightforward to evaluate the impact of such modifications on optimizer performance ϕ and arrive at possible explanations for observed differences. For example, when we evaluated fides 2D/GN with single (orange) and multi-reflection (dark-green, Fig 3A–3C) implementations and correlated changes to ϕ with statistics of optimization trajectories (Fig 3D and 3E), we found that the single reflection performance ϕ was reduced on four problems (0.59 to 0.65-fold change; Beer, Lucarelli, Schwen, Zheng; orange arrows Fig 3A). Lower performance was primarily due to a decrease in convergence rate ν (Fig 3B and 3C). We attributed this behavior to the fact that a restriction on the number of reflections lowered the predicted decrease in objective function values for reflected steps. This, in turn, increased the fraction of iterations in which stepback yielded constrained Cauchy steps (Pearson’s correlation coefficient r = −0.85, p-value p = 2.3 ⋅ 10−4, Fig 3D) as well as the average fraction of boundary-constrained iterations (r = −0.82, p = 6 ⋅ 10−4, Fig 3E), both slowing convergence.

Fig 3. Evaluation of stepback strategies.

Fig 3

Colors indicate optimizer setting and are the same in all panels. All increases/decreases are relative to multi-reflection fides 2D/GN with normal bounds. A: Performance comparison (ϕ). B: Increase in convergence count γ. C: Increase in convergence rate ν. D: Association between increase in average fraction of constrained Cauchy steps with respect to total number of boundary constrained iterations and increase in ν for the single-reflection method. E: Association between increase in average fraction of boundary constrained iterations and increase in ν for the single-reflection method. F: Association between increase in average fraction of iterations with a numerically singular transformed Hessian B^k and decrease in γ for the multi-reflection method for fides 2D/GN with bounds extended by two orders of magnitude. G: Association between increase in average fraction of iterations with integration failures and increase in γ for multi-reflection fides 2D/GN without bounds.

A naive approach to addressing issues with parameter boundaries is to extend or remove them. We therefore repeated optimization with fides 2D/GN (multi-reflection) with parameter boundaries extended by one (blue) or two (pink) orders of magnitude or completely removed (light green). We found that extending boundaries by one order of magnitude reduced ϕ for 6 problems (0.12 to 0.74 fold change; Bachmann, Beer, Fiedler, Isensee, Lucarelli, Weber; blue arrows Fig 3A) and extending boundaries by two orders of magnitude reduced ϕ for an additional 4 problems (0.13 to 0.85 fold change; Bruno, Crauste, Schwen, Zheng; pink arrows Fig 3A). We found that decreased ϕ was primarily the result of lower ν (Fig 3B), which we attributed to a larger fraction of iterations in which the transformed Hessian B^k was singular (r = −0.78, p = 1.6 ⋅ 10−3, Fig 3F). Removing boundaries decreased ϕ for all problems, a result of lower values of γ, which we attributed to higher fraction of iterations with integration failures (r = −0.82, p = 5.7 ⋅ 10−4, Fig 3G).

These findings demonstrate the importance and difficulty of choosing appropriate optimization boundaries, since excessively wide boundaries may lead to frequent integration failures and/or the creation of an ill-conditioned trust-region subproblem. In contrast, using boundaries that are too narrow has the potential to exclude the global optimum. When managing boundary constraints the use of multi-reflection as compared to single-reflection yields a small performance increase, albeit significantly smaller than the variation we observed (in the previous section) between different implementations of the same optimization algorithm.

3.3 Iterative schemes and negative curvature

To further study the positive effect of improving the conditioning of trust-region subproblems on the optimizer performance ϕ, we carried out optimization using BFGS and SR1 Hessian approximations. BFGS and SR1 can yield full-rank Hessian approximations, resulting in well-conditioned trust-region subproblems, even for non-identifiable problems. Moreover, in contrast to GN, these approximations converge to the true Hessian under mild assumptions [19]. The SR1 approximations can also account for directions of negative curvature and might therefore be expected to perform better when saddle-points are present.

We compared ϕ for fides 2D/GN (blue), fides 2D/BFGS (orange) and fides 2D/SR1 (green) (Fig 4A) and found that fides 2D/BFGS failed to reach the best objective function value for one problem (Beer) and Fides 2D/SR1 for two problems (Beer, Fujita). Compared to fides 2D/GN, ϕ for fides 2D/BFGS was higher on three problems (2.02 to 9.88 fold change; Boehm, Fiedler, Schwen; orange arrows) and four problems for fides 2D/SR1 (1.32 to 7.12 fold change; Boehm, Crauste, Fiedler; green arrows), it was lower for a majority of the remaining problems (BFGS 7 of 13 problems, 0.05 to 0.49 fold change; SR1 8 of 13 problems, 0.07 to 0.78 fold change). Decomposing ϕ into improvements in convergence rate ν (Fig 4B) and improvements in success counts γ (Fig 4C) revealed that SR1 improved ν for 7 problems (2.23 to 5.97 fold change; Beer, Boehm, Crauste, Fiedler, Fujita, Weber, Zheng; green arrows Fig 4B). Out of these 7 problems, BFGS improved ν for only four problems (2.33 to 6.63 fold change; Beer, Boehm, Fiedler, Zheng; orange arrows Fig 4B). We found that for both approximations, the increase in ν was correlated with the change in average fraction of iterations without trust-region radius (Δk) updates (BFGS: r = −0.8, p = 1.1 ⋅ 10−3; SR1: r = −0.61, p = 2.8 ⋅ 10−2, Fig 4D). Δk is not updated when the predicted objective function decrease is in moderate agreement with the actual objective function decrease (0.25 < ρk < 0.75, see Section 2.3), likely a result of inaccurate approximations to the Hessian. Thus, higher convergence rate ν of SR1 and BFGS schemes was likely due to more precise approximation of the objective function Hessian.

Fig 4. Evaluation of iterative Hessian approximation schemes.

Fig 4

Color scheme is the same in all panels. All increases/decreases are relative to fides 2D/GN unless otherwise noted. A: Performance comparison (ϕ). B: Increase in convergence count γ. C: Increase in convergence rate ν. D: Association between increase in average fraction of iterations without updates to the trust-region radius Δk and increase in ν for fides 2D/SR1 (green) and fides 2D/BFGS (orange). E: Association between average fraction of iterations where smallest eigenvalue λmin of the transformed Hessian B^k(SR1) is negative and ν for fides 2D/SR1. F: Association between fraction of iterations without updates to Bk and increase in ν relative to fides 2D/SR1 for fides 2D/BFGS. G: Association between fraction of iterations with a numerically singular transformed Hessian B^k and increase in γ for fides 2D/SR1 (green) and fides 2D/BFGS (orange). Change in γ was was censored at a threshold of 10−2 to visualize models with γ. H: Association between fraction of iterations without updates to Δk and increase in ν for fides ND/GN. I: Association between change in average fraction of iterations where smallest eigenvalue λmin of the transformed Hessian B^k(SR1) is negative and γ relative to fides 2D/SR1 for fides ND/SR1.

To better understand the origins of performance differences between BFGS and SR1, we analysed the eigenvalue spectra of SR1 approximations and found that SR1 convergence rate ν was correlated with the average fraction of iterations where the transformed Hessian approximations B^k had negative eigenvalues (r = −0.71, p = 6.7 ⋅ 10−3, Fig 4E). This correlation suggests that directions of negative curvature approximated by the SR1 scheme tended not to yield good search directions; handling of negative curvature is therefore unlikely to explain observed improvements in convergence rates. In contrast to negative eigenvalues, we found that difference in ν between BFGS and SR1 was correlated with the average fraction of iterations in which the BFGS approximation did not produce an update (r = −0.88, p = 8.3 ⋅ 10−5, Fig 4F). The BFGS approximation is not updated when the curvature condition is violated (see Section 2.4). Such a high fraction of iterations not prompting updates is surprising, since the violation of the curvature condition is generally considered to be rare [19]. It is nonetheless a plausible explanation for lower convergence rates, since the BFGS approximation is not expected to always converge to the true Hessian under such conditions [19].

For three problems, the BFGS (orange arrows, Fig 4C) and/or SR1 (green arrows, Fig 4C) approximations increased γ (1.29 to 3.12 fold change; Boehm, Crauste, Schwen), but for most problems γ was reduced by more than two-fold (−∞ to 0.46 fold change; SR1+BFGS: Beer, Fiedler Fujita, Isensee, Lucarelli Weber, Zheng; BFGS: Bachmann; SR1: Brannmark), canceling the benefit of faster convergence rate for many of these problems. Paradoxically, we found that convergence count changes were correlated with changes in the average fraction of iterations having ill-conditioned trust-region subproblems (BFGS: r = 0.86, p = 1.6 ⋅ 10−4; SR1: r = 0.73, p = 4.8 ⋅ 10−3, Fig 4G). Therefore, improved conditioning of the trust-region subproblem, unexpectedly, came at the cost of smaller regions of attraction for minima having low objective function values.

We complemented the analysis of 2D methods by evaluating their respective ND methods, which almost exclusively performed worse than 2D methods. We found that Fides ND/GN outperformed Fides 2D/GN on two problems (1.43 to 3.25-fold change; Crauste, Zheng) and performed similarly on one problem (0.99-fold change; Fiedler). Fides ND/BFGS outperformed Fides 2D/BFGS on three problems (1.16 to 1.37-fold change; Boehm, Fujita, Schwen) and performed similarly on three examples (1.01 to 1.06-fold change; Bachmann, Bruno, Fiedler). Fides ND/SR1 outperformed Fides 2D/SR1 on two problems (1.51 to 1.70-fold change; Crauste, Schwen). These results were surprising, since the use of 2D methods is generally motivated by lower computational costs, not better performance; the ND approach gives, in contrast to the 2D approach, an exact solution to the trust-region subproblem. For GN, the change in convergence rate ν was correlated with the change in average fraction of iterations in which the trust-region radius Δk was not updated (r = −0.8, p = 1.1 ⋅ 10−3, Fig 4H). For fides SR1/ND, the change in γ with respect to fides SR1/2D was correlated with the change in average fraction of iterations in which B^k had negative eigenvalues (r = −0.72, p = 1.2 ⋅ 10−2, Fig 4I). This suggests that inaccuracies in Hessian approximations may have stronger impact on ND methods as compared to 2D methods, thereby mitigating advantages that are theoretically possible.

Overall these results suggest that BFGS and SR1 approximations can improve optimization performance through faster convergence, but often suffer from poorer global convergence properties. Thus, they rarely outperform the GN approximation. BFGS and SR1 perform similarly on most problems, with the exception of a few problems for which BFGS cannot be updated due to violation of curvature conditions. We conclude that, while saddle points may be present in some problems, they do not seem to pose a major issues that can be resolved using the SR1 approximation.

3.4 Hybrid switching approximation scheme

We hypothesized that the high success count γ of the GN approximation primarily arose in the initial phase of optimization, which determines the basin of attraction on which optimization will converge. In contrast, we the high convergence rate ν of the BFGS approximation seemed more likely to arise from more accurate Hessian approximation in later phases of optimization, when convergence to the true Hessian is achieved. To test this idea, we designed a hybrid switching approximation that initially uses a GN approximation, but simultaneously constructs an BFGS approximation. As soon as the quality of the GN approximation becomes limiting, as determined by a failure to update the trust-region radius for nhybrid consecutive iterations, the hybrid approximation switches to the BFGS approximation for the remainder of the optimization run.

We compared the hybrid switching approach using different values of nhybrid (25, 50, 75, 100) to fides 2D/GN (equivalent to nhybrid = ∞) and fides 2D/BFGS (equivalent to nhybrid = 0). Evaluating optimizer performance ϕ, we found that the hybrid approach was successful for all problems, with nhybrid = 50 performing best, improving ϕ by an average of 1.51 fold across all models (range: 0.56 to 6.34-fold change). The hybrid approach performed better than fides 2D/GN and fides 2D/BFGS on 5 problems (1.71 to 6.34-fold change; Crauste, Fiedler, Fujita, Lucarelli, Zheng; + signs Fig 5A). It performed better than fides 2D/GN, but worse than fides 2D/BFGS only on one problem (Boehm; (+) sign Fig 5A). The hybrid approach performed similar to fides 2D/GN on 5 out of the 7 remaining problems (0.89 to 1.03-fold change; Beer, Brannmark, Bruno, Isensee, Schwen; = signs Fig 5A). Decomposing ϕ into γ and ν, we found that hybrid switching resulted in higher ν for the four problems in which fides 2D/BFGS had higher ν than fides 2D/GN (Beer, Boehm, Fiedler, Zheng), as well as three additional problems (Crauste, Fujita, Lucarelli). These were the same three problems for which SR1 had higher ν (Fig 4B), but BFGS did not, as a consequence of a high number of iterations without Bk updates (Fig 4F). Consistent with this interpretation, we confirmed that the hybrid approach had very few iterations without Bk updates. In contrast to BFGS, the hybrid switching approach maintained a similar γ as GN, meaning that higher ν generally translated into higher ϕ. Evaluating the overlap between start-points that yielded successful runs for GN showed a higher overlap for the hybrid switching approach as compared to BFGS (Fig 5D), suggesting that higher γ for the hybrid switching was indeed the result of higher similarity in regions of attraction. These findings further corroborate that local convergence of fides 2D/GN is slowed by the limited approximation quality of GN.

Fig 5. Evaluation of hybrid switching approximation.

Fig 5

Color scheme is the same in all panels. All increases/decreases are relative to fides 2D/GN. A: Performance comparison (ϕ). B: Increase in convergence count γ. C: Increase in convergence rate ν. D: Overlap score for start-points that yield successful optimization runs with respect to fides 2D/GN.

4 Comparison of hybrid approximation schemes

Inaccuracies in the GN approximation have been previously discussed in the optimization literature and are known to lead to slow convergence and even divergence of optimization runs [39, 40]. Several methods have been proposed to address this issue in the context of non-zero residual problems. These include the Structured Secant Method (SSM) [42], the Totally Structured Secant Method (TSSM) [43], the hybrid scheme by Fletcher and Xu (FX) [45] and the Gauss-Newton Structured BFGS (GNSBFGS) approach [40]. All of these methods combine the GN and BFGS approximations in different ways (see Section 2.4).

We implemented support for these Hessian approximation schemes in fides and compared optimizer performance ϕ against fides 2D/GN and the best performing hybrid switching method (nhybrid = 50). We found that the hybrid switching method was among the best performing methods (fold change-0.85 to 1.15) on a majority of problems (7 out of 13; Beer, Brannmark, Bruno, Crauste, Fujita, Lucarelli, Schwen; oragne arrows Fig 6A) and was the only method other than fides 2D/GN that resulted in successful runs for all problems. fides 2D/GN was among the best performers on 6 problems (Beer, Boehm, Bruno, Isensee, Schwen, Weber; dark green arrows Fig 6A) whereas GNSBFGS was among the best performers for three problems (Beer, Boehm, Zheng; yellow arrows Fig 6A) and FX for one problem (Fiedler; purple arrows Fig 6A). Both GNSBFGS and FX failed for one problem (Lucarelli). SSM and TSSM were among the best performers on one problem (Bachmann) and failed on one problem (Isensee). We conclude that the hybrid switching method is the most reliable and efficient method among all methods that we tested.

Fig 6. Evaluation of hybrid switching approximation.

Fig 6

Color scheme is the same in all panels. All increases/decreases are relative to fides 2D/GN. A: Performance comparison (ϕ). B: Increase in convergence count γ. C: Increase in convergence rate ν.

5 Discussion

In this paper we evaluated the properties of trust-region methods that affect the performance of parameter estimation for ODE models of cellular biochemistry. We used a previously described corpus of 13 models and the associated experimental data as a testbed relevant to many problems encountered in the application of dynamical models in bio-medicine. The evaluation was made possible by the re-implementation of a MATLAB algorithm originally described by Coleman and Li [31]. The resulting fides toolbox also implements advanced logging capabilities that permits detailed analysis of optimization traces. We then compared success counts γ and convergence rate ν to the numerical properties of optimization traces across multiple models. This analysis promoted us to develop a novel hybrid switching scheme that uses two approaches for Hessian approximation: the Gauss-Newton approximation early in a run (when the basin of attraction is being determined) and the BFGS approximation later in a run (when a fast convergence rate to the local minimum is crucial). For many but not all problems in our test corpus, we found that fides in combination with hybrid switching exhibited the best performance and resolved issues with inconsistent final objective function values.

Overall we were unable to identify a single uniformly superior optimization approach, in line with the infamous “no free lunch” theorem of optimization [10]. Hybrid switching improved average performance and was superior for the majority of problems, but there remained a minority for which other hybrid methods performed substantially better. This heterogeneity highlights the existence of several distinct problem classes but we have thus far been unable to identify their essential properties. We anticipate that future innovation in optimization methods for biochemical models will likely be driven either by better understanding of how differences in optimizer performance relate to model structure, enabling a priori selection of the best numerical approach, or new ways of analyzing the numerical properties of optimization traces, driving the development of new adaptive methods. Until then, the availability of multiple Hessian approximation schemes and trust-region sub-problem solvers in fides will be of general utility with a range of models. We have used it ourselves with large ODE-based models that are on the limit of what can be considered practically [56]. Similarly, fides will be a sound foundation for the development of new and better optimization methods.

Our findings suggest that issues previously encountered with fmincon and lsqnonlin are likely due to premature optimizer termination and not due to “rugged” objective function landscapes (a situation in which many similar local minima are present). Our results also corroborate previous findings from others showing that the use of Gauss-Newton approximations can be problematic for optimization problems featuring sloppy models [4]. However, we did not find improved performance with the SR1 scheme, which can handle saddle points. The inconsistent and often poor performance of BFGS and SR1 schemes was also unexpected, but our findings suggest that the problem arises in the global convergence properties of BFGS and SR1, as revealed by lower convergence counts γ. Global convergence properties depend on the shape of the objective function landscape and are therefore expected to be problem-specific. BFGS and SR1 may therefore perform better when combined with hybrid global-local methods such as scatter search [57], which substantially benefit from good local convergence [49], but are less dependent on global convergence. Moreover, SR1 and BFGS schemes enable the use of trust-region optimization for problems in which the GN approximation is not applicable, such as when a non-Gaussian error model is used [33] or when gradients are computed using adjoint sensitivities [36], which is particularly relevant for large multi-pathway models with many parameters [58].

We were surprised to observe that differences in the performance of distinct numerical implementations of the same fundamental mathematical instructions (algorithm) could be greater than the differences between distinct algorithms. We propose that these unexpected differences arise from how numerical edge cases are handled. For example, fides uses a Moore-Penrose pseudoinverse to compute the Newton search direction for the 2D subproblem solver, while fmincon uses damped Cholesky decomposition. Another possible source of difference is the use of different simulation and sensitivity computation routines. While both data2dynamics and AMICI employ CVODES [32] for simulation and computation of parameter sensitivity, there may be slight differences in implementation of advanced features such as handling of events and pre-equilibration. Overall, these findings demonstrate the complexity of comparing trust-region methods and the impact of subtle differences in numerical methods on optimizer performance. Thus, the benchmarking of different optimization algorithms requires consistent implementation within a single framework. This consistency is likely to have practical benefit for individuals interested in developing new optimization methods. It is also possible that the superior performance exhibited by fides will generalize to optimization problems other than biochemical models.

Overall, our results demonstrate that fides not only finds better solutions to parameter estimation problems for ODE-based biochemical models when state-of-the-art algorithms fail, but also performs on par or better on problems where established methods find good solutions. The modular and flexible implementation of fides and its interoperability with other toolboxes that facilitates the import of PEtab problems is expected to drive its adoption within the systems biology as a preferred means of performing parameter estimation.

Supporting information

S1 Text. Performance comparison using different values for the consistency thresholds τ.

(PDF)

Acknowledgments

We thank the O2 High Performance Compute Cluster at Harvard Medical School for computing support and Carolin Loos, Daniel Weindl, Dilan Pathirana, Elba Raimundez, Erika Dudkin, Jan Hasenauer and Leonard Schmiester for providing Benchmark examples in PEtab format; we also thank Jan Hasenaur, Juliann Tefft and Edward Novikov for feedback on the manuscript and Daniel Weindl for feedback on the implementation of fides.

Data Availability

Fides is published under the permissive BSD-3-Clause license with source code publicly available at https://github.com/fides-dev/fides. Citeable releases are archived on Zenodo. Code to reproduce results presented in this manuscript is available at https://github.com/fides-dev/fides-benchmark.

Funding Statement

This work was supported by the Human Frontier Science Program (Grant no. LT000259/2019-L1; F.F.), and the National Cancer Institute (Grant no. U54-CA225088; P.K.S). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Kitano H. Computational Systems Biology. Nature. 2002;420(6912):206–210. doi: 10.1038/nature01254 [DOI] [PubMed] [Google Scholar]
  • 2. Klipp E, Herwig R, Kowald A, Wierling C, Lehrach H. Systems biology in practice. Wiley-VCH, Weinheim; 2005. [Google Scholar]
  • 3. Fröhlich F, Loos C, Hasenauer J. Scalable Inference of Ordinary Differential Equation Models of Biochemical Processes. In: Sanguinetti G, Huynh-Thu VA, editors. Gene Regulatory Networks: Methods and Protocols. Methods in Molecular Biology. New York, NY: Springer; 2019. p. 385–422. [DOI] [PubMed] [Google Scholar]
  • 4. Gutenkunst RN, Waterfall JJ, Casey FP, Brown KS, Myers CR, Sethna JP. Universally sloppy parameter sensitivities in systems biology models. PLOS Computational Biology. 2007;3(10):1871–1878. doi: 10.1371/journal.pcbi.0030189 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Aldridge BB, Burke JM, Lauffenburger DA, Sorger PK. Physicochemical modelling of cell signalling pathways. Nature Cell Biology. 2006;8(11):1195–1203. doi: 10.1038/ncb1497 [DOI] [PubMed] [Google Scholar]
  • 6. Ballnus B, Schaper S, Theis FJ, Hasenauer J. Bayesian parameter estimation for biochemical reaction networks using region-based adaptive parallel tempering. Bioinformatics. 2018;34(13):i494–i501. doi: 10.1093/bioinformatics/bty229 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Raue A, Kreutz C, Maiwald T, Bachmann J, Schilling M, Klingmüller U, et al. Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics. 2009;25(25):1923–1929. doi: 10.1093/bioinformatics/btp358 [DOI] [PubMed] [Google Scholar]
  • 8. Loos C, Moeller K, Fröhlich F, Hucho T, Hasenauer J. A Hierarchical, Data-Driven Approach to Modeling Single-Cell Populations Predicts Latent Causes of Cell-To-Cell Variability. Cell Systems. 2018;6(5):593–603.e13. doi: 10.1016/j.cels.2018.04.008 [DOI] [PubMed] [Google Scholar]
  • 9. Steiert B, Timmer J, Kreutz C. L1 regularization facilitates detection of cell type-specific parameters in dynamical systems. Bioinformatics. 2016;32(17):i718–i726. doi: 10.1093/bioinformatics/btw461 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Wolpert DH, Macready WG. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation. 1997;1(1):67–82. doi: 10.1109/4235.585893 [DOI] [Google Scholar]
  • 11. Bongartz I, Conn AR, Gould N, Toint PL. CUTE: constrained and unconstrained testing environment. ACM Transactions on Mathematical Software. 1995;21(1):123–160. doi: 10.1145/200979.201043 [DOI] [Google Scholar]
  • 12. Gould NIM, Orban D, Toint PL. CUTEr and SifDec: A constrained and unconstrained testing environment, revisited. ACM Transactions on Mathematical Software. 2003;29(4):373–394. doi: 10.1145/962437.962439 [DOI] [Google Scholar]
  • 13. Gould NIM, Orban D, Toint PL. CUTEst: a Constrained and Unconstrained Testing Environment with safe threads for mathematical optimization. Computational Optimization and Applications. 2015;60(3):545–557. doi: 10.1007/s10589-014-9687-3 [DOI] [Google Scholar]
  • 14. Villaverde AF, Henriques D, Smallbone K, Bongard S, Schmid J, Cicin-Sain D, et al. BioPreDyn-bench: a suite of benchmark problems for dynamic modelling in systems biology. BMC Systems Biology. 2015;9(8). doi: 10.1186/s12918-015-0144-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Hass H, Loos C, Raimúndez-Álvarez E, Timmer J, Hasenauer J, Kreutz C. Benchmark problems for dynamic modeling of intracellular processes. Bioinformatics. 2019;35(17):3073–3082. doi: 10.1093/bioinformatics/btz020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Abdulla UG, Poteau R. Identification of parameters for large-scale kinetic models. Journal of Computational Physics. 2021;429:110026. doi: 10.1016/j.jcp.2020.110026 [DOI] [Google Scholar]
  • 17. Abdulla UG, Poteau R. Identification of parameters in systems biology. Mathematical Biosciences. 2018;305:133–145. doi: 10.1016/j.mbs.2018.09.004 [DOI] [PubMed] [Google Scholar]
  • 18. Raue A, Schilling M, Bachmann J, Matteson A, Schelke M, Kaschek D, et al. Lessons learned from quantitative dynamical modeling in systems biology. PLoS ONE. 2013;8(9):e74335. doi: 10.1371/journal.pone.0074335 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Nocedal J, Wright S. Numerical optimization. Springer Science & Business Media; 2006. [Google Scholar]
  • 20. Fujita KA, Toyoshima Y, Uda S, Ozaki Yi, Kubota H, Kuroda S. Decoupling of Receptor and Downstream Signals in the Akt Pathway by Its Low-Pass Filter Characteristics. Science Signaling. 2010;3(132):ra56–ra56. doi: 10.1126/scisignal.2000810 [DOI] [PubMed] [Google Scholar]
  • 21. Burnham KP, Anderson DR. Model selection and multimodel inference: A practical information-theoretic approach. 2nd ed. New York, NY: Springer; 2002. [Google Scholar]
  • 22. Kreutz C. Guidelines for benchmarking of optimization-based approaches for fitting mathematical models. Genome Biology. 2019;20(1):281. doi: 10.1186/s13059-019-1887-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Transtrum MK, Machta BB, Sethna JP. Geometry of nonlinear least squares with applications to sloppy models and optimization. Physical Review E. 2011;83(3):036701. doi: 10.1103/PhysRevE.83.036701 [DOI] [PubMed] [Google Scholar]
  • 24. Tönsing C, Timmer J, Kreutz C. Optimal Paths Between Parameter Estimates in Non-linear ODE Systems Using the Nudged Elastic Band Method. Frontiers in Physics. 2019;7. [Google Scholar]
  • 25. Dauphin YN, Pascanu R, Gulcehre C, Cho K. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In: Advances in Neural Information Processing Systems 26; 2014. p. 2933–2941. [Google Scholar]
  • 26. Broyden CG. The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations. IMA Journal of Applied Mathematics. 1970;6(1):76–90. doi: 10.1093/imamat/6.1.76 [DOI] [Google Scholar]
  • 27. Fletcher R. A new approach to variable metric algorithms. The Computer Journal. 1970;13(3):317–322. doi: 10.1093/comjnl/13.3.317 [DOI] [Google Scholar]
  • 28. Goldfarb D. A Family of Variable-Metric Methods Derived by Variational Means. Mathematics of Computation. 1970;24(109):23–26. doi: 10.1090/S0025-5718-1970-0258249-6 [DOI] [Google Scholar]
  • 29. Shanno DF. Conditioning of quasi-Newton methods for function minimization. Mathematics of Computation. 1970;24(111):647–656. doi: 10.1090/S0025-5718-1970-0274030-6 [DOI] [Google Scholar]
  • 30. Conn AR, Gould NIM, Toint PL. Convergence of quasi-Newton matrices generated by the symmetric rank one update. Mathematical Programming. 1991;50(1):177–195. doi: 10.1007/BF01594934 [DOI] [Google Scholar]
  • 31. Coleman TF, Li Y. On the convergence of interior-reflective Newton methods for nonlinear minimization subject to bounds. Mathematical Programming. 1994;67(1):189–224. doi: 10.1007/BF01582221 [DOI] [Google Scholar]
  • 32. Hindmarsh AC, Brown PN, Grant KE, Lee SL, Serban R, Shumaker DE, et al. SUNDIALS: Suite of Nonlinear and Differential/Algebraic Equation Solvers. ACM Transaction Mathematical Software. 2005;31(3):363–396. doi: 10.1145/1089014.1089020 [DOI] [Google Scholar]
  • 33. Maier C, Loos C, Hasenauer J. Robust parameter estimation for dynamical systems from outlier-corrupted data. Bioinformatics. 2017;33(5):718–725. [DOI] [PubMed] [Google Scholar]
  • 34. Raue A, Steiert B, Schelker M, Kreutz C, Maiwald T, Hass H, et al. Data2Dynamics: a modeling environment tailored to parameter estimation in dynamical systems. Bioinformatics. 2015;31(21):3558–3560. doi: 10.1093/bioinformatics/btv405 [DOI] [PubMed] [Google Scholar]
  • 35. Fröhlich F, Weindl D, Schälte Y, Pathirana D, Paszkowski L, Lines GT, et al. AMICI: High-Performance Sensitivity Analysis for Large Ordinary Differential Equation Models. Bioinformatics. 2021;37(20):3676–3677. doi: 10.1093/bioinformatics/btab227 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Fröhlich F, Kaltenbacher B, Theis FJ, Hasenauer J. Scalable parameter estimation for genome-scale biochemical reaction networks. PLoS Computational Biology. 2017;13(1):1–18. doi: 10.1371/journal.pcbi.1005331 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Stapor P, Fröhlich F, Hasenauer J. Optimization and profile calculation of ODE models using second order adjoint sensitivity analysis. Bioinformatics. 2018;34(13):i151–i159. doi: 10.1093/bioinformatics/bty230 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Raue A. Quantitative Dynamic Modeling: Theory and Application to Signal Transduction in the Erythropoietic System. University of Freiburg; 2013. [Google Scholar]
  • 39. Al-Baali M, Fletcher R. Variational Methods for Non-Linear Least-Squares. Journal of the Operational Research Society. 1985;36(5):405–421. doi: 10.2307/2582880 [DOI] [Google Scholar]
  • 40. Zhou W, Chen X. Global Convergence of a New Hybrid Gauss–Newton Structured BFGS Method for Nonlinear Least Squares Problems. SIAM Journal on Optimization. 2010;20(5):2422–2441. doi: 10.1137/090748470 [DOI] [Google Scholar]
  • 41. Dennis J J E, Walker HF. Convergence Theorems for Least-Change Secant Update Methods. SIAM Journal on Numerical Analysis. 1981;18(6):949–987. doi: 10.1137/0718067 [DOI] [Google Scholar]
  • 42. Dennis JE, Martinez HJ, Tapia RA. Convergence theory for the structured BFGS secant method with an application to nonlinear least squares. Journal of Optimization Theory and Applications. 1989;61(2):161–178. doi: 10.1007/BF00962795 [DOI] [Google Scholar]
  • 43. Huschens J. On the Use of Product Structure in Secant Methods for Nonlinear Least Squares Problems. SIAM Journal on Optimization. 1994;4(1):108–129. doi: 10.1137/0804005 [DOI] [Google Scholar]
  • 44. Dennis JE, Gay DM, Welsch RE. Algorithm 573: NL2SOL—An Adaptive Nonlinear Least-Squares Algorithm. ACM Transactions on Mathematical Software. 1981;7(3):369–383. doi: 10.1145/355958.355966 [DOI] [Google Scholar]
  • 45. Fletcher R, Xu C. Hybrid Methods for Nonlinear Least Squares. IMA Journal of Numerical Analysis. 1987;7(3):371–389. doi: 10.1093/imanum/7.3.371 [DOI] [Google Scholar]
  • 46.Moré JJ. The Levenberg-Marquardt algorithm: Implementation and theory. In: Lecture Notes in Mathematics. vol. 630. Springer Berlin Heidelberg; 1978. p. 105–116.
  • 47. Byrd RH, Schnabel RB, Shultz GA. Approximate solution of the trust region problem by minimization over two-dimensional subspaces. Mathematical Programming. 1988;40(1):247–263. [Google Scholar]
  • 48. Coleman TF, Li Y. An interior trust region approach for nonlinear minimization subject to bounds. SIAM Journal on Optimization. 1996;6(2):418–445. doi: 10.1137/0806023 [DOI] [Google Scholar]
  • 49. Villaverde AF, Fröhlich F, Weindl D, Hasenauer J, Banga JR. Benchmarking optimization methods for parameter estimation in large kinetic models. Bioinformatics. 2019;35(5):830–838. doi: 10.1093/bioinformatics/bty736 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Jeffreys H. Theory of Probability. 3rd ed. Oxford: Oxford University Press; 1961. [Google Scholar]
  • 51. Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods. 2020;17(3):261–272. doi: 10.1038/s41592-019-0686-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, et al. Array programming with NumPy. Nature. 2020;585(7825):357–362. doi: 10.1038/s41586-020-2649-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Schmiester L, Schälte Y, Bergmann FT, Camba T, Dudkin E, Egert J, et al. PEtab—Interoperable specification of parameter estimation problems in systems biology. PLOS Computational Biology. 2021;17(1):e1008646. doi: 10.1371/journal.pcbi.1008646 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Schmiester L, Schälte Y, Fröhlich F, Hasenauer J, Weindl D. Efficient parameterization of large-scale dynamic models based on relative measurements. Bioinformatics. 2020;36(2):594–602. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Degasperi A, Fey D, Kholodenko BN. Performance of objective functions and optimisation procedures for parameter estimation in system biology models. npj Systems Biology and Applications. 2017;3(1):1–9. doi: 10.1038/s41540-017-0023-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Fröhlich F, Gerosa L, Muhlich J, Sorger PK. Mechanistic model of MAPK signaling reveals how allostery and rewiring contribute to drug resistance. bioRxiv; 2022. Available from: https://www.biorxiv.org/content/10.1101/2022.02.17.480899v1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Egea JA, Rodriguez-Fernandez M, Banga JR, Marti R. Scatter search for chemical and bio-process optimization. Journal of Global Optimization. 2007;37(3):481–503. doi: 10.1007/s10898-006-9075-3 [DOI] [Google Scholar]
  • 58. Fröhlich F, Kessler T, Weindl D, Shadrin A, Schmiester L, Hache H, et al. Efficient Parameter Estimation Enables the Prediction of Drug Response Using a Mechanistic Pan-Cancer Pathway Model. Cell Systems. 2018;7(6):567–579.e6. doi: 10.1016/j.cels.2018.10.013 [DOI] [PubMed] [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010322.r001

Decision Letter 0

Olivier Morin

1 Nov 2021

PONE-D-21-29213Unsupervised Deep Learning Supports Reclassification of Bronze Age Cypriot Writing SystemPLOS ONE

Dear Silvia Ferrara,

Thank you for submitting your manuscript for consideration at PLOS ONE. I now have in hand reports from two reviewers — one a specialist in computational linguistics and machine learning, the other a specialist of the Cypro-Minoan script with a balanced view of current debates. Both reviewers find merit in this highly original and innovative paper, but recommend that you make extensive changes in order for it to be suitable for publication. I am inviting you to submit a revision which will then be sent back to the same two reviewers. In case the reviews reveal strong disagreements over publication, or new issues, I may contact a third reviewer in addition to those two.

Reviewer 1 provides a great deal of technical advice worth following, especially concerning a relative lack of clarity and exhaustivity in explaining the methods. One concern that I share with them has to do with the validation of your model on cursive Hiragana, which yields lackluster results, calling into question the subsequent applicability of the model to the Cypro-Minoan data. This, in my view, is the number one issue raised by your paper. Please address it in depth, by explaining why the model may yield reliable conclusions in spite of its limited aplicability to a better known and better documented script, and by qualifying your conclusions accordingly.

Like Reviewer 1, I noticed that your paper was not accompanied by open data and code, and that you declared some restrictions would apply to the sharing of the data. Given the highly technical and innovative nature of your study, I do think that giving Reviewer 1 access to your data and code is important to let them appreciate the robustness of the results.

Reviewer 2 confesses serious misgivings about your raw data, but also notes that your conclusion is plausible, and indeed can be defended on other grounds. Please address their thorough and detailed comments; they are mostly dealing with the quality and completeness of the sources. I share their last remark on the fact the three versions of Cypro-Minoan might be one and the same script, without necessarily encoding one and the same language.

Thank you again for allowing us to consider your manuscript.

Olivier Morin

P.S. Please bear in mind this standard caveat if and when you revise the paper: Inviting a revision does not entail that the next version, or any subsequent version, will be accepted for publication. It is my policy to avoid a protracted editorial process that may in any case end in rejection. I am not pre-judging this particular case but this is something I warn all authors of.

Please submit your revised manuscript by Dec 16 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

[Note: HTML markup is below. Please do not edit.]

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: No

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: I Don't Know

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The paper is clearly written and has a straightforward research question, which aims at investigating if three subgroups of the Cypro-Minoan script are the same language or not. The methods used in the paper are relevant for the research question and well described. The appreciation toward the paper is generally positive. However, some revisions could be made to clarify more details, which is why the reviewer suggests: Accept with (major?) revisions.

General comments:

- The title mentions 'deep learning' but within the text, the term changes between 'machine learning' (e.g., also the term ‘machine-based techniques on p2) and 'deep learning'. I suggest to synchronize which term to use when referring to the methods. IMHO, both methods are used in the paper, e.g., k-means is more likely to be affiliated to the machine learning category while neural network is more likely to be affiliated to the deep learning category.

- I understand that the authors have concern to release detailed code and data upon acceptance, but in the current state it is hard to judge how robustly was the analysis conducted. For example, there is not much details about the detailed settings of the parameters and few information from the attached supplementary tables can be used to interpret the robustness of the analysis. The description of the method is well-written though, so the editors may decide if the code is needed for reviewing or not.

- P3: Is there a table or a part of text giving the distribution of the three scripts in the used data? I could not find the information in the text or in the supplementary materials (sorry if I missed it). My follow-up question on the distribution would be: If there is a lack of balance in the data, does this lack of balance between the three scripts have an effect on the output of the experiments?

- P4-P5: “We tested various supervised and unsupervised deep learning models … Our preliminary experiments found ….” If these additional experiments are mentioned, their procedure and output should probably be provided somewhere, either in the text or in the supplementary materials.

- P5: “The model therefore tries to reconstruct the category to which the sign belongs, both from the context (preceding/following sign) and from the sign itself.” This is a cool idea! A quick question though: if the vector model considers the context of each character, isn’t it inherently biased toward a separation of the three scripts? Since only scripts from the same category will occur together?

- P6: for sign2vec, did you consider different window size and type? E.g., three surrounding characters instead of one? Or only considering the words before/after rather than symmetric context? What is the dimension of the output vector? E.g., 50, 100, 500? Sorry if it is already written somewhere and I missed it.

- P6 Table 1: I understand that the authors are considering the Rand index, which is easily affected by the size of different clusters between the predicted and the actual data. Maybe the adjusted Rand index could be considered? Plus, the definition of the metrics listed in Table 1 could be explained. If the journal was a CS or CL journal that might not be necessary, but since PLOS has a larger audience, I suggest to add some brief explanations about those metrics.

- P6 “The scores were not so high because…” If I understand correctly the flow of this section, the authors wanted to validate the model on the Japanese writing system. If the results are not conclusive for Japanese, how do the authors show that the model is reliable?

- P8 “To evaluate our model, we could only use as ground truth a set of 37 signs” I might be a bit confused here. If only those signs were used to evaluate the model, why include the other signs? This is probably already written somewhere in the text but I might have missed it.

- P9 “This demonstrates that, while the vector is not 100% accurate, it is still a reliable method to test the hypothesis that some signs allegedly exclusive to CM1 and CM2 are in reality paleographic variants.” Would it be possible to compare the accuracy obtained here with a random/majority baseline to be able to assess how high or low is the accuracy?

- P11 “These results strengthen the hypothesis that the division of CM in three sub-scripts is invalid, as previously put forward on the basis of paleographic and structural evidence. The implications are of paramount importance for the script,” AFAIU, since the results do not provide a clear-cut (e.g., the accuracy of the models is not very high), I suggest that the authors could be a bit more modest when mentioning the impact of the results. The limitations of the study should also be mentioned somewhere in the conclusion, e.g., the distribution of data? The accuracy of the models and its implication on the interpretability of the results, etc…

Minor comments:

- P1, abstract: “assess if it holds up against a multi-pronged, multi-disciplinary attack”, I suggest to avoid using too strong terms such as 'attack'. However, that might be a personal preference.

- P1: If space allows it, a map showing the location of the sites where the inscriptions have been found could be helpful for readers not familiar with the topic.

- P4: “Almost all neural systems treating images in some ways are based on CNN, thus they seemed most fitting to our ends.” While I agree with the authors, a few references here would be nice to support this statement.

- P5 “we applied some quantitative measures using the MNIST dataset confirming” What are those measures again? I might have missed it.

- P5: I suggest avoiding sentences such as “as mentioned above” in the paper, if you do, please refer to the exact location/section in the text.

- P6: Finally, we combine the DeepCluster-v2 loss, … this is a bit abstract to follow IMHO. Maybe a toy example would help?

- Figure 6 and Figure 7 are hard to interpret visually. Maybe replacing the characters with points and using shapes/colors to distinguish the characters would make it easier to read?

- The format of the refs should be synchronized, for example: [2,15]: The page number seems to be missing, [10,11,30]: The publisher is missing. If the place is required as in [39], it should be added for the other references too.

Reviewer #2: The central question posed in this paper is an old one, and there seems to me some potential to try to address it with new methods of the sort proposed. However, I have serious misgivings about the way in which this research has been conducted. I hope that my specific comments below will demonstrate the grounds for my misgivings, and the reasons why on balance I felt compelled to record that the data (or rather the way in which the data were analysed) do not appear to support the conclusions offered. Unless the authors can address these issues seriously, I fear that the paper comes across as a superficial ‘confirmation’ of pre-existing theories that may otherwise be quite adequately argued via other methods.

P1: The summary of CM inscriptions overlooks at least one further inscription from Tiryns, on the handle of a clay vessel, published by Brent Davis – this work is even on the bibliography (no. 20)! There is also a new potmark from the same site which I believe will be published by the same author.

P3 L64: It seems misrepresentative to say that signs not attested in the tiny repertoire of CM3 were ‘allegedly discarded for linguistic reasons’ (L65). Masson and Olivier both seem to have accepted that there could be signs that simply have yet to be attested in the corpus from Ras Shamra.

It is also worth noting that Olivier was openly sceptical of any linguistic distinction for CM3, making clear in Olivier 2007 that the designation is nothing more than geographical, and often using scare quotes for it (‘CM3’) – even though he maintained Masson’s categorisation.

P4 L106-8: The possibility that some inscriptions at the end of the chronological timespan for Cypro-Minoan might actually be written in the Cypro-Greek syllabary is raised here without any critical commentary on the implications of such an assumption. These documents could be excluded on chronological grounds, but the authors should ideally take some position on their epigraphic status (whether agnostic or not). There have been several recent discussions of the problem, including e.g.:

Duhoux, Y. (2012) ‘The most ancient Cypriot text written in Greek: The Opheltas’ spit’, Kadmos 51, 71-91.

Egetmeyer, M. (2013) ‘From the Cypro-Minoan to the Cypro-Greek syllabaries: linguistic remarks on the script reform’ in Steele, P.M. (ed.) Syllabic Writing on Cyprus and its Context, Cambridge 2012, 107-131.

Egetmeyer, M. (2017) ‘Script and language on Cyprus during the Geometric Period: An overview on the occasion of two new inscriptions’ in Steele, P.M. (ed.) Understanding Relations Between Scripts: The Aegean Writing Systems, Oxford, 108-201.

Steele, P. (2018) Writing and Society in Ancient Cyprus, Cambridge, second chapter.

P4 L111ff: Excluding signs on an essentially linguistic basis is methodologically worrying (the reasoning is repeated on P12). Whether or not there exist arguments in favour of linguistic differentiation, any study of sign shapes / palaeography should be blind to linguistic considerations – which surely is what the authors intend by pursuing the kinds of analysis on offer in this paper.

There might be some sense in excluding all the material from Ras Shamra simply on the grounds that writing practices at that site could be somewhat different from those on Cyprus – though, on the other hand, this might be a good reason for including them. But it must be all or nothing, and the methods employed here cannot seriously investigate the possibility or otherwise that CM3 should be considered as a separate entity from the rest of the CM corpus if tablets #212 and #215 are excluded (and along with them, six sign shapes thus not represented among the data used for this study).

P4 L124ff: Given the aim to achieve more neutral analysis of palaeographic variation in Cypro-Minoan, it is a shame that the authors used published drawings, presumably largely from Olivier where some examples could be criticised as to their representation of features. Those drawings also tend to flatten some kinds of variation owing to palaeographic factors, such as the comparative width of strokes*. Perhaps it is impossible for the present study, but the results of ongoing scanning projects could be particularly beneficial to this kind of analysis because of their more accurate measurement of sign features. There is nevertheless a risk here that the results of the analysis will be affected by pre-existing assumptions and biases on the part of the person who drew the signs, given that any drawing is already in itself interpretive.

*Considerations of this kind indeed seem to have affected the analysis given the divergent clustering of signs on clay documents and signs on other supports, as noted by the authors at P8-9.

P9 L316-8: “This property supports the argument that CM2 is not a script distinct from CM1, but rather a form of the same writing system that differs mainly due to the use of a different writing medium as well as scribal style (smaller and more angular signs).”

This seems to me to be quite a bold claim (not that CM2 is not a separate script in its own right, which is surely at some level true, but that the present investigation can be used as evidence for such a position). I am not convinced that the results can only be read in this way. For one thing, it may be that the quite consistent way in which CM2 signs were drawn (presumably by Olivier?) predisposed them to a differential analysis by the neural network – as I mentioned above, this is a serious risk to the results of the study and needs to be considered carefully.

It would also be helpful to know to what extent differences of scale have been factored in. The signs of the CM2 tablets are far smaller than signs on many other supports, and this makes a difference a) to what it was possible for the author to render, and b) to the accuracy of any modern drawing of the signs. Published editions tend to flatten the degree of difference in size between signs in different inscriptions, but this could indeed be a significant factor in their recognisability (whether to ancient humans or modern computational methods).

P10-11: In the section ‘Application of the Vector’, it is clear that the authors seem to have drawn conclusions that supported pre-held beliefs, but very little information is given as to how the conclusions are supported. Accuracy levels such as 6/10, 7/10, 2/3, 3/3 need to be explained in some detail – what exactly is denoted by these numbers, and what does ‘accuracy’ mean here? Have the results been tested for statistical significance?

P12 L449-451: “If the inscriptions in our dataset (mainly CM1 and CM2) represent the same script, then the likelihood increases that this single script recorded the same language.”

This is an extremely bold and methodologically unsound claim. There are countless examples across the world and across different time periods of different languages being written in a single script / writing system. The language-related considerations offered here do not seem appropriate to the purposes of the paper.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Philippa M. Steele

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS Comput Biol. 2022 Jul 13;18(7):e1010322. doi: 10.1371/journal.pcbi.1010322.r002

Author response to Decision Letter 0


12 May 2022

We thank the reviewers for their invaluable feedback. Thanks to their comments, the revised manuscript has improved both in the experimental settings and the content in numerous ways. Due to the sheer amount of feedback from the editor and reviewers, we provide a detailed response to each question in a separate file included in the revised submission.

Attachment

Submitted filename: Response to Reviewers.pdf

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010322.r003

Decision Letter 1

Olivier Morin

24 May 2022

Unsupervised Deep Learning Supports Reclassification of Bronze Age Cypriot Writing System

PONE-D-21-29213R1

Dear Dr. Ferrara,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Olivier Morin

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Dear author,

I am happy to tell you that I am accepting your paper, which both reviewers found to be much improved as a result of this round of revisions. This is an extremely exciting and well designed piece of research which I am certain will move debates forward in several fields.

Thank you for considering PLoS

OM

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I thank the authors for making the revisions and providing the data and code. I think that the authors did a really good job updating the paper. The provided code also has a clear documentation. I suggest that the paper can be published.

If the editors think that the following minor comments are relevant, they can be transmitted to the authors. If not, the editors may choose to ignore these comments.

Minor comments:

p4 "a recent publication has wanted to see this shape in a new inscription from Erimi-Kafkalla, so we had to count it for our purposes." -> which publication? please add the reference.

Fig 3 -> the circles are bit hard to read and two plots take quite a lot of space by conveying partially overlapping information. Could they be merged? e.g., could we have numbers printed on the circles on the map?

Fig 6 and Fig 7 -> I suggest to put the link to the live 3D scatter plot in the captions, so that is easier for readers to find it.

p7 : "We trained 20 different models initialized with different random parameters. By applying this procedure, the model becomes less susceptible to the the random initialization of its parameters. ": why 20 random parameters? why not e.g., 50 or 100? Please develop a bit how 20 random parameters are excepted to cover the space of randomness for the parameters. I totally understand that there is technical limitations too, as running the code takes time. So, it would not be a problem to mention technical limitations of time, but at least how the number 20 was chosen should be clear.

p7 "The 2560-dimensional representation obtained by applying the proposed neural model, namely Sign2Vecd, and the baseline, DeepClusterv2, was the starting point for any further processing and paleographic consideratio": the use of DeepClusterv2 with the term baseline is not explained before and suddenly shows up here. I suggest to add a sentence or two (here or in the previous text) mentionning why DeepClusterv2 is the baseline. It can be understood from the context, but it always better to make it clear for readers.

Reviewer #2: I would like to thank the authors for their careful consideration of comments from both reviewers. I felt that all my comments had been addressed satisfactorily, and I think that the article now reads very well. I am happy to recommend it for publication without further modifications.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Philippa M. Steele

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1010322.r004

Acceptance letter

Olivier Morin

22 Jun 2022

PONE-D-21-29213R1

Unsupervised Deep Learning Supports Reclassification of Bronze Age Cypriot Writing System

Dear Dr. Ferrara:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Olivier Morin

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Text. Performance comparison using different values for the consistency thresholds τ.

    (PDF)

    Attachment

    Submitted filename: Response to Reviewers.pdf

    Data Availability Statement

    Fides is published under the permissive BSD-3-Clause license with source code publicly available at https://github.com/fides-dev/fides. Citeable releases are archived on Zenodo. Code to reproduce results presented in this manuscript is available at https://github.com/fides-dev/fides-benchmark.


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES