Skip to main content
Elsevier Sponsored Documents logoLink to Elsevier Sponsored Documents
. 2014 Sep;98:521–527. doi: 10.1016/j.neuroimage.2014.04.040

Efficient gradient computation for dynamical models

B Sengupta 1,, KJ Friston 1, WD Penny 1
PMCID: PMC4120812  PMID: 24769182

Abstract

Data assimilation is a fundamental issue that arises across many scales in neuroscience — ranging from the study of single neurons using single electrode recordings to the interaction of thousands of neurons using fMRI. Data assimilation involves inverting a generative model that can not only explain observed data but also generate predictions. Typically, the model is inverted or fitted using conventional tools of (convex) optimization that invariably extremise some functional — norms, minimum descriptive length, variational free energy, etc. Generally, optimisation rests on evaluating the local gradients of the functional to be optimized. In this paper, we compare three different gradient estimation techniques that could be used for extremising any functional in time — (i) finite differences, (ii) forward sensitivities and a method based on (iii) the adjoint of the dynamical system. We demonstrate that the first-order gradients of a dynamical system, linear or non-linear, can be computed most efficiently using the adjoint method. This is particularly true for systems where the number of parameters is greater than the number of states. For such systems, integrating several sensitivity equations – as required with forward sensitivities – proves to be most expensive, while finite-difference approximations have an intermediate efficiency. In the context of neuroimaging, adjoint based inversion of dynamical causal models (DCMs) can, in principle, enable the study of models with large numbers of nodes and parameters.

Keywords: Augmented Lagrangian, Adjoint methods, Dynamical systems, Dynamic causal modelling, Model fitting

Highlights

  • We compare three methods to compute gradients in dynamical systems.

  • The methods are finite-differences, forward sensitivity and reverse adjoints.

  • The adjoint method is 50–70 folds more efficient.

Introduction

An important goal of systems neuroscience is to integrate empirical data from various neuroimaging modalities with biologically informed models that describe the underlying generative processes. Here, the data to be explained are for example M/EEG and fMRI recordings made while subjects perform various experimental tasks, and the underlying neurodynamic processes are framed in terms of differential equations describing activity in neural masses, mean fields, or neural fields (David et al., 2006, Deco et al., 2008, Friston et al., 2003).

Considerable insight can be gained from studying the emergent properties of such neurodynamic processes. These can then be qualitatively compared with empirical data, allowing consilience among multiple levels of description (Gazzaniga, 2010, Hopfield and Brody, 2001, Wilson, 1999). An alternative approach is to directly fit neurodynamical models to neuroimaging data using standard model fitting procedures from statistics and machine learning (Bishop, 2006, Press et al., 1992). Differences in the generative processes induced by experimental manipulations can then be associated with changes in underlying brain connectivity. One example of such an approach is Dynamic Causal Modelling (DCM) (Friston et al., 2003) which fits differential equation models to neuroimaging data using a variational Bayesian scheme (Friston et al., 2007).

More generally, in the statistics and machine learning literature various methods have been employed to fit differential equations to data, from maximum likelihood approaches (Ramsay et al., 2007) to Bayesian sampling algorithms (Calderhead and Girolami, 2009, Vyshemirsky and Girolami, 2008). The majority of these convex optimisation approaches involve computing the gradient; the change in the cost function produced by a change in model parameters. This gradient is then combined with information from line searches (e.g., Wolfe's conditions) or methods involving a Newton, quasi-Newton (low-rank) or Fisher information based curvature estimators to update model parameters (Bishop, 1995, Nocedal and Wright, 2006, Press et al., 1992). The main computational bottleneck in these algorithms is the computation of the gradient (or the curvature) of the parametric cost function. This motivates the search for efficient methods to evaluate gradients.

This paper compares three different methods for computing gradients, and studies the conditions under which each is preferred. The first is the Finite Difference (FD) method, which is the simplest and most general method — and is currently used in DCM. The second is the Forward Sensitivity (FS; also known as tangent linear) method, which has previously been proposed in the context of modeling fMRI time series (Deneux and Faugeras, 2006). The third is the Adjoint Method (AM) which has previously been used in the context of dynamical systems theory (Wang, 2013), weather forecasting (Errico, 1997), image registration (Clark, 2011) and single-neuron biophysics (Stemmler et al., 2012).

The paper is structured as follows — the methods section describes each approach including a mathematical derivation of the adjoint method. Examples of the FS and AM updates are then provided for the case of simple Euler integration. The results section reports numerical simulations that disclose the scaling characteristics of each method. Simulations are provided for linear dynamical and weakly-coupled oscillator systems. We conclude with a discussion of the relative merits of each method.

Methods

We consider dynamical systems of the form

x˙=fxpjxp=12ygxp2 (1)

where x is a state variable, the dot notation denotes a time derivative dxdt, t is time, f(·) is the flow equation (dynamics), and p are model parameters. The model produces a prediction via an observation function g (x, p) and an instantaneous cost function j (x, p) measures the squared difference from data points y. The total cost is then given by the integral up to time point T

Jp=0Tjxpdt. (2)

We consider three methods for computing the gradient dJdp.

Finite difference method

The (one-sided) finite difference approximation to the gradient is then

dJdpi=Jp+δiJpδi (3)

where δi denotes a small change (generally, ϵ where ϵ is the machine epsilon) to the ith parameter. The error in the computation of this gradient is of order δi. The computation of dJdp requires P + 1 runs of the integration process, one for each model parameter. It is also possible to use central differences

dJdpi=Jp+δiJpδi2δi (4)

which has an error of order δi2 but requires 2P + 1 runs of the integration process. Variations on the vanilla FD approach are discussed in (Press et al., 1992, Richtmeyer and Morton, 1967).

Forward Sensitivity method

The original dynamical model (Eq. (1)) can be implicitly differentiated w.r.t parameters to give

dx˙dp=fxdxdp+fp. (5)

If the state variables are of dimension D and the parameters of dimension P then the quantity dx˙dp is a D × P matrix, which can be vectorized to form a new flow function. This forms a new dynamical system of dimension D × P that can then be integrated using any numerical method to produce dxdp as a function of time. The Forward Sensitivity approach has been known since the publication of Gronwall's theorem (Gronwall, 1919). The cost gradient is then given by accumulating the sensitivity derivative dxdp over time according to:

dJdp=0Tdjdpdtdjdp=jxdxdp+jp=jggxdxdp+jggp. (6)

Euler example

This section illustrates the FS approach first-order Euler integration of the dynamics

xn=xn1+τfxn1p (7)

at discrete times t(n). The FS method is based on differentiating this equation to give

dxndp=dxn1dp+τfxn1dxn1dp+fp. (8)

This method is illustrated in Fig. 1 where the solid path indicates a trajectory of points xn for a dynamical system with parameters p and the dotted path indicates the trajectory x¯n for the same dynamical system but with parameters p¯=p+δi. The dotted path can be obtained from the solid path via the total derivative dxndpi in the direction of the perturbation, δi. The FS method provides a method for computing this derivative. Under a first order Euler approach for integrating the dynamics, this is implemented using the above recursion.

Fig. 1.

Fig. 1

Forward Sensitivity The solid path indicates a trajectory of points xn, with n = 1…5, for a dynamical system with parameters p. The dotted path indicates the trajectory x¯n for the same dynamical system but with parameters p¯=p+δi. The dotted path can be reached from the solid path via the total derivative dxndp. The Forward Sensitivity approach provides a method for computing this derivative.

Because the perturbed path (dotted in Fig. 1) can be reached from the original trajectory via the total derivative dxndp, there is no need to separately integrate the system with parameters p¯. Geometrically, the points x¯n in Fig. 1 can be reached via the solid and dashed lines (rather than the dotted lines).

We rewrite the recursion equation as

Sxn=Sxn1+τFxn1Sxn1+Fpn1 (9)

where

Fpn=fpxnFxn=fxxnSxn=dxdpxn. (10)

Sx is a [D × P] matrix, Fx is [D × D] and Fp is [D × P]. We then have

dJdp=n=1Njxndxndp=n=1NjxnSxn (11)

and jx (n) is the derivative of j (x, p) with respect to x, evaluated at xn. This method requires the derivatives Fx and Fp. These will be specific to the dynamical model in question and, in this paper, are computed analytically. We provide the Euler example here as a simple illustration of the method. The numerical simulations in this paper use a more accurate integration method (see below).

Adjoint method

Errico (1997) and Giles and Pierce (2000) provide introductions to the adjoint method. A derivation of the adjoint method for dynamical models is provided rigorously in Cao et al. (2003) and in Hindmarsh and Serban (2002). Here, we provide an informal derivation, starting with the cost function

Jp=0Tjxpdt. (12)

The constraints implied by the dynamics allow us to write the Lagrangian

Lp=0Tjxpdt+0TλTx˙fxpdt. (13)

Once the system has been integrated (solved for x) we have x˙=fxp. Hence the second term in the Lagrangian disappears and we have

dJdp=dLdp. (14)

This is the gradient we wish to compute. So far it may seem that we have made no progress but it turns out that dLdp can be computed efficiently.

Before proceeding further, we summarize the main ideas behind the adjoint method. The central concept is that the Lagrange vector λT constrains the dynamical system to variations around the forward path xn. The Lagrange vectors are of the same dimension as x and form a time series. Algebraically, the contribution of the total derivative dxdp to the gradient dJdp is made zero, by setting λT appropriately. This means that the sensitivity derivative need not be calculated, resulting in a large computational saving. Instead, the gradient dJdp can be expressed as a function of λT. We will now go through this in a bit more detail:

The proof proceeds by differentiating Eq. (13) to give the gradient

dJdp=0Tjxdxdp+jpdt+0TλTdx˙dpfxdxdpfpdt. (15)

The term involving the change in total derivative, dx˙dp, can be rewritten using integration by parts

0TλTdx˙dpdt=λTdxdp0T0TdλTdtdxdpdt. (16)

Substituting this into the previous expression and rearranging to group together terms involving the sensitivity derivative dxdp give

dJdp=0TdxdpjxλTfxdλTdtdt+0TjpλTfpdt+λTdxdp0T. (17)

The adjoint vector λT can be used to eliminate the first term involving the sensitivity derivative. This term is zero when:

dλTdt=jxλTfx. (18)

This is known as the adjoint equation and is used to compute λT. The gradient is then given by

dJdp=0TjpλTfpdt+λTdxdp0T. (19)

As our goal has been to avoid computation of the sensitivity derivative, dxdp, we can eliminate the last term above by integrating the adjoint equations backward in time, starting with λT = 0. The starting value for the adjoint equation is arbitrary and it can be proven that if λt=tf,a and λt=tf,b are two different starting values for the adjoint equation with solutions λa and λb respectively, then dJdpλa=dJdpλb. Therefore, there exist infinitely many starting conditions for the adjoint equation that yields the same parametric gradient.

If the initial conditions do not depend on the parameters, as we assume for our numerical examples, then we have dxdp=0 at t = 0 and the gradient reduces to

dJdp=0TjpλTfpdt. (20)

This equality can now be used to compute the parametric gradients, given the backwards solution of the adjoint equation.

There are no restrictions on the functional form to make the adjoint method viable — if one can pose the optimization problem via a Lagrangian, then the adjoint method could be used for any dynamical system (ordinary-, delay-, random- and partial-differential equation). The one and only constraint is the stability of the adjoint equation for the underlying dynamical system. Thus, static or dynamical systems that are routinely used in neuroimaging are amenable to an adjoint formulation under some loss-function including stochastic DCMs that have an analytical model for the noise (of class Cω). Table 1 highlights the key differences between all of the methods and the crucial steps required in each of them.

Table 1.

Comparison of the different gradient computation methods. The flow eqn. is either linear or non-linear, with P parameters and N state variables.

Finite differences Forward sensitivities Adjoint
Suitability Arbitrary N ≫ P P ≫ N
Cost (1 + P) flow eqns. P non-linear sensitivity eqns. + 1 flow eqn. 1 linear adjoint eqn. + 1 flow eqn.
Key steps
  • 1.

    Integrate flow eqn.

  • 2.

    Parametrically perturb flow P times

  • 1.

    Integrate the coupled flow and sensitivity eqns.

  • 1.

    Integrate flow eqn.

  • 2.

    Integrate adjoint eqn.

Euler example

The specification of the adjoint method starts from the specification of the Lagrangian. For us, this has the particular form

L=12n||yngxn||2+nλnxnxn1τfxn1p (21)

where the first term is the original cost function, the second term enforces the constraint embodied in the Euler integration of the state dynamics, and λn is a [1 × D] vector of Lagrange multipliers. Because L is a scalar, and the state xn is a column vector, the Lagrange multipliers must be a row vector. It is in this sense that they are adjoint (or transposed) to the states. The derivative of L with respect to the states is then given by

dLdxn=gxnyngxn+λnλn+1τλn+1Fxn (22)

where gx(n) is the derivative of g (x, p) with respect to x, evaluated at xn. Setting Eq. (22) to zero (i.e., solving for the states) gives

λn=λn+1I+τFxngxnyngxn. (23)

This is a backward recursion, known as the adjoint equation, that starts with λN = 0. After solving the adjoint equations we can enter λn into Eq. (20), giving

dJdp=τn=1NjpnλnFpn (24)

where jp (n) is the derivative of j (x, p) with respect to p, evaluated at xn. If the observation function does not depend on model parameters then the first term disappears. A first order Euler Adjoint method has been used previously in the context of image registration (Clark, 2011). However, we provide the Euler example here as an illustration of the method. The numerical simulations in this paper use a more accurate integration method (see below).

Stability

It is known that if flows are prescribed as ODEs, then their adjoint solutions are also stable (Cao et al., 2003). Under these conditions, the numerical stability of the adjoint system is guaranteed when the adjoint equation is integrated backwards in time, in the sense that the flow is reversed. Consider a linear autonomous system, f = Ax + B where A ∈ ℝn × n and B ∈ ℝn and both are invariant in time. Being linear in states with pre-defined initial conditions, such a system can be analytically integrated to yield a solution as a sum of its n matrix exponentials with unique eigenvectors and their respective eigenvalues. Such a system is asymptotically stable when the eigenvalues have a negative real part i.e., Re(Λ(A)) < 0. For such a linear autonomous system the eigenvalues of the adjoint equation, λ˙=dfdx)Tλ=ATλ have a positive real part, proving to be asymptotically unstable. If one were now to reverse the flow i.e., λ˙=dfdxTλ=ATλ, the eigenvalues then have a negative real part and the dynamics is asymptotically stable. One can derive similar results for non-linear equations using a perturbation expansion, suggesting the condition of asymptotic and uniform stability is guaranteed when the adjoint equations are integrated backwards.

Integration

For the numerical results in this paper, we used MATLAB's ode15s function which implements a variable order method for integrating stiff differential equations (Shampine and Reichelt, 1997). Two important parameters governing the operation of this algorithm are the absolute, a, and relative, r, error tolerances. The estimated error in each integration step is constrained to be less than max(r|xn|, a).

The absolute and relative tolerances were set to 10− 7 for each of the gradient computation methods although results were also obtained with different sets of tolerances, taking on values aFDaFSaAM and rFDrFSrAM for the Finite Difference, Forward Sensitivity and Adjoint Methods respectively. When tolerances were set differently, these values were tuned for each problem (linear/nonlinear) so as to achieve good agreement among the methods.

Results

Custom MATLAB scripts were written to implement each of the gradient computation methods.

Linear models

First we consider the linear models

x˙=Ax (25)

where x is a D-dimensional state vector with initial value x0 = 1, and A is a D × D connectivity matrix. Readers familiar with DCM for fMRI will recognize A as the endogenous or average connectivity matrix. A model with D states therefore has P = D2 parameters (Fig. 2(A)). The system is integrated from time 0 to T. To ensure stability, we constructed A using the linear expansion

A=d=1DqdvdvdT (26)

where vd ∼ N(vd; 0, 1) are standard D-dimensional Gaussian random vectors, which are serially orthogonalized. The scalars qd are negative real numbers so that the corresponding eigenstates are exponentially decaying modes. The values of qd were set so that the corresponding time constants were between T/5 and T. Fig. 2(B) shows the time series for five such eigenstates.

Fig. 2.

Fig. 2

Linear System (A) The 5-dimensional state-space model and (B) the linear evolution of its eigenstates.

For each model dimension considered (see below) we generated a state trajectory using known model parameters generated as described above. We then created an observable data time series yn = g(xn) with the observation function g(xn) = xn, that is, all of the dynamical states are observed.

We then created ‘perturbed’ parameters by adding Gaussian noise with a standard deviation of 10% of the original parameters. The cost function was defined as

J=12nyngxn2. (27)

To summarize, the ‘data points’ yn were created using the original parameters and the ‘model predictions’, g(xn) used the perturbed parameters. Gradients were then estimated at this perturbed point.

The systems were integrated using the tolerances of FD and FS fixed at 10− 7. Although, the tolerance of AM was adjusted so as to achieve the best fit to the FD based gradient estimate, for the efficiency-scaling simulations we fixed it at a lower value of 10− 3. This is illustrated in Fig. 3(A) that shows the estimated gradients for a D = 5 dimensional linear system. Setting the tolerance of the AM method to 10− 3 did not affect the mean-squared deviation of the gradients obtained between the FD and the AM methods (data not shown).

Fig. 3.

Fig. 3

Computational efficiency for linear systems (A) Comparison of the parametric gradient obtained by the three methods. (B) Scaling of run-time as a function of the number of nodes. The absolute and relative tolerances of FD and FS methods were set to 10− 7 while the tolerances for the AM method were fixed to 10− 3. Simulation time was fixed at 400 ms.

We then compared the three gradient computation methods. Fig. 3(B) plots the computation time as a function of state dimension. For a 28-node system with 784 model parameters the computation time for the adjoint method is 77 times less than for the finite difference method.

Nonlinear models

Next, we consider weakly coupled oscillators of the form

x˙it=fi+j=1,jiDαijsinxitxjt+βijcosxitxjt (28)

where the model parameters comprise the parameters f, α and β. A model with D states therefore has P = 2D2 − D parameters. We used a cost function equal to the mean square deviation between observed and predicted state trajectories i.e., the norm of the prediction error (again, all states were observed).

The tolerance parameters of the integration process were set identical to those used for the linear models. Again, the adjoint equation being a linear first order ODE enables the use of lower tolerances (10− 3). This process was implemented for a D = 5 dimensional problem and Fig. 4(B) shows the estimated gradients.

Fig. 4.

Fig. 4

Computational efficiency for non-linear systems (A) Comparison of the gradient obtained by the three methods. Here, the last five parameters quantify the intrinsic oscillator frequencies, and the first 40 parameters the sine and cosine interaction terms. (B) Scaling of run-time as a function of the number of nodes. The absolute and relative tolerances of FD and FS methods were set to 10− 7 while the tolerances for the AM method were fixed to 10− 3. Simulation time was fixed at 100 ms.

We then compared the three gradient computation methods. Fig. 4(C) plots the computation time as a function of state dimension. For a 24-node system with 1128 model parameters the computation time for the adjoint method is 50 times less than for the finite difference method.

The efficiency of the AM formulation is due to two reasons — first, the adjoint equation is linear and second it is integrated only once to compute the gradient. Given that the AM equation is linear, the condition number is low, enabling any ODE integrator to integrate the adjoint equation with ease. Indeed, if the ODE integrator is subjected to unnecessary high tolerances it spends more time integrating the adjoint equation. Thus, the advantage of the adjoint scheme reveals both the parsimonious integration scheme as well as the linearity of this equation that requires less-conservative tolerances.

Discussion

Optimization theory attaches mathematical well-posedness to the issue of biological data-assimilation by formalizing the relationship between empirically measured data and a model generating those responses. In this paper, we introduced three different methods for numerical gradient estimation that forms an integral part of any convex optimization framework. Our comparison establishes that the adjoint method is computationally more efficient for numerical estimation of parametric gradients for state-space models — both linear and non-linear, as in the case of a dynamical causal model (DCM). As is apparent from the gradient equations, the adjoint method is efficient when the numbers of parameters are much greater than the number of states determining the cost function. The contrary is true for the Forward Sensitivity approach albeit for large state-space models, finite-difference based gradients prove to be beneficial. There are two remarks that can be made about the adjoint formulation. First, regardless of whether the flow is linear or non-linear the adjoint method requires the integration of a single linear equation — the computational efficiency is inherent in the structure of the equation. Second, the appearance of a transpose on the adjoint vector implies that the flow of information in the system of equations is reversed; it is in this sense that the adjoint equations are integrated backwards in time.

Although, adaptive error correction is invariably used in the integration of differential equations, the numerical simulations suggest that the tolerance used for integrating the flow and adjoint differential equations are vital in determining the accuracy of the parametric gradients, due to the presence of discretization error. In theory, plugging in the solution field to the flow equation should yield zero, but due to the existence of discretization error the residual is generally non-zero. The same is true for the adjoint equation. In fact, a theorem by Becker and Rannacher (2001) shows how discretization of the gradient depends on the average of the errors and the residuals accumulated in the integration of flow and the adjoint equations (Bangerth and Rannacher, 2003). This is also the case for obtaining gradients via finite-differencing, where we find that the fidelity of error-free discretization of the flow equations is a prerequisite for guaranteeing parametric gradients that are a reliable estimate of the true gradient.

It is known that if the flows are prescribed as ODEs the numerical stability of the adjoint system is guaranteed when the adjoint equation is integrated backwards in time, in the sense that the flow is reversed. Our derivation of the adjoint method is mathematically informal so as to illustrate the basic working principle; rigorous mathematical proofs that accommodate higher order differential algebraic equations, time-dependent parameters or objective functionals that depend on initial conditions are available elsewhere (Cao et al., 2003).

For DCM inversions that allow problem specification in a pre-defined form it may be generally time-efficient to derive the gradient functions analytically rather than using automatic differentiation (Bischof et al., 2012). Automatic differentiation is particularly important for partial differential equations (PDEs) that have 3-dimensional representations, requiring automatization and therefore proving to be error resilient (Sala et al., 2004). For a PDE-constrained optimization problem the solution is governed by a fully coupled Karush–Kuhn–Tucker (KKT) system of equations. These can be computationally expensive for parabolic and hyperbolic PDEs, as well as displaying slow convergence of the defined objective functional (ill-conditioning). The adjoint formulation remedies this by decoupling the coupled PDEs and replacing them by iterative solves of a linear adjoint PDE equation. Additional success of adjoint-based gradient methods for PDE-constrained optimization relies on the fact that mesh independent convergence can be attained. Further speedup could also be obtained by using compiled implementation of forward and adjoint sensitivity methods available in the SUNDIALS time integration package (Hindmarsh and Serban, 2002). This code is written in C and may offer substantial speed advantages over MATLAB implementations.

For data assimilation, it is only rarely that we have precise information on the states or the parameters (Wiener, 1964). Is the adjoint method equally efficient when there is noise on the states and the parameters? One way to represent uncertainty in a mathematical model, whether static or dynamic is to formulate it as a polynomial chaos expansion (Wiener, 1938), one for each noisy state or parameter. This then enables the characteristic statistical quantities to be evaluated as some function of the expansion coefficients — the uncertainty now becomes parameterized. The estimation of the numerical gradient can then proceed akin to a deterministic dynamical model where the computational burden does not depend on the number of parameters (Alekseev et al., 2010). Alternatively, adjoint methods can be gracefully combined with Markov Chain Monte Carlo (MCMC) sampling-based evaluation of the posterior densities (Ghate and Giles, 2005). In a forthcoming paper we address how second-order adjoined gradient estimates could be obtained in the context of Bayesian inversion of neural masses, mean fields, and neural field equations.

Constrained optimization problems arise in many scientific fields, from neuroscience to financial mathematics, therefore a fundamental need for efficient computational methodologies arises. Our work promotes such an endeavor especially for data-sets arising in neuroscience, for example the inversion of large-scale DCMs that have been routinely used to test hypotheses about different functional brain architectures.

Acknowledgments

This work is supported by a Wellcome Trust/DBT Early Career (500182/Z/11/Z) fellowship to BS. KJF and WDP are supported by the Wellcome Trust (091593/Z/10/Z). Initial code-base used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Contributor Information

B. Sengupta, Email: b.sengupta@ucl.ac.uk.

K.J. Friston, Email: k.friston@ucl.ac.uk.

W.D. Penny, Email: w.penny@ucl.ac.uk.

References

  1. Alekseev A.K., Navon I.M., Zelentsov M.E. The estimation of functional uncertainty using polynomial chaos and adjoint equations. Int. J. Numer. Methods Fluids. 2010;67:328341. [Google Scholar]
  2. Bangerth W., Rannacher R. Birkhäuser; 2003. ETH Zürich Lectures in Mathematics, Chapter Adaptive Finite Element Methods for Differential Equations. [Google Scholar]
  3. Becker R., Rannacher R. An optimal control approach to a posteriori error estimation in finite element methods. Acta Numerica. 2001;10:1102. [Google Scholar]
  4. Bischof C., Martin Bücker H., Vehreschild A., Willkomm J. MATLAB-Day, MATLAB-Day, Aachen, Germany, October 2012. 2012. Automatic Differentiation for MATLAB (ADiMat) [Google Scholar]
  5. Bishop C.M. Oxford University Press; Oxford: 1995. Neural Networks for Pattern Recognition. [Google Scholar]
  6. Bishop C.M. Springer; New York: 2006. Pattern Recognition and Machine Learning. [Google Scholar]
  7. Calderhead B., Girolami M. Estimating Bayes factors via thermodynamic integration and population MCMC. Comput. Stat. Data Anal. 2009;53(12):4028–4045. [Google Scholar]
  8. Cao Y., Li S., Petzold L., Serban R. Adjoint sensitivity analysis for differential-algebraic equations: the adjoint DAE system and its numerical solution. SIAM J. Sci. Comput. 2003;24:1076–1089. [Google Scholar]
  9. Clark A. Department of Aeronautics, Imperial College; London: 2011. Geodesic Shooting for Anatomical Curve Registration on the Plane. (PhD thesis) [Google Scholar]
  10. David O., Kiebel S., Harrison L., Mattout J., Kilner J., Friston K. Dynamic causal modeling of evoked responses in EEG and MEG. NeuroImage. 2006;30(4):1255–1272. doi: 10.1016/j.neuroimage.2005.10.045. (May 2006) [DOI] [PubMed] [Google Scholar]
  11. Deco G., Jirsa V.K., Robinson P.A., Breakspear M., Friston K. The dynamic brain: from spiking neurons to neural masses and cortical fields. PLoS Comput. Biol. 2008;4(8):e1000092. doi: 10.1371/journal.pcbi.1000092. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Deneux T., Faugeras O. Using nonlinear models in fMRI data analysis: model selection and activation detection. NeuroImage. 2006;32:1669–1689. doi: 10.1016/j.neuroimage.2006.03.006. [DOI] [PubMed] [Google Scholar]
  13. Errico R. What is an adjoint model? Bull. Am. Meteorol. Soc. 1997;78:2577–2591. [Google Scholar]
  14. Friston K.J., Harrison L., Penny W. Dynamic causal modelling. NeuroImage. 2003;19(4):1273–1302. doi: 10.1016/s1053-8119(03)00202-7. [DOI] [PubMed] [Google Scholar]
  15. Friston K., Mattout J., Trujillo-Barreto N., Ashburner J., Penny W. Variational free energy and the Laplace approximation. NeuroImage. 2007;34(1):220–234. doi: 10.1016/j.neuroimage.2006.08.035. (Jan 2007) [DOI] [PubMed] [Google Scholar]
  16. Gazzaniga M. Neuroscience and the correct level of explanation for understanding mind. Trends Cogn. Sci. 2010;14:291–292. doi: 10.1016/j.tics.2010.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Ghate D., Giles M.B. Symposium on Applied Aerodynamics and Design of Aerospace Vehicles. 2005. Inexpensive Monte Carlo Uncertainty Analysis. [Google Scholar]
  18. Giles M., Pierce N. An introduction to the adjoint approach to design. Flow Turbul. Combust. 2000;65:393–415. [Google Scholar]
  19. Gronwall T. Note on the derivatives with respect to a parameter of the solutions of a system of differential equations. Ann. Math. 1919;20:292–296. [Google Scholar]
  20. Hindmarsh A., Serban R. Technical report. Centre for Applied Scientific Computing, Lawrence Livermore National Laboratory; 2002. User Documentation for CVODES, and ODE Solver with Sensitivity Analysis Capabilities. [Google Scholar]
  21. Hopfield J., Brody C. What is a moment? Transient synchrony as a collective mechanism for spatiotemporal integration. Proc. Natl. Acad. Sci. 2001;98(3):1282–1287. doi: 10.1073/pnas.031567098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Nocedal J., Wright S.J. 2nd edition. Springer; New York: 2006. Numerical Optimization. [Google Scholar]
  23. Press W.H., Teukolsky S.A., Vetterling W.T., Flannery B.P. Cambridge University Press; New York: 1992. Numerical Recipes in C. [Google Scholar]
  24. Ramsay J., Hooker H., Campbell D., Cao J. Parameter estimation for differential equations: a generalized smoothing approach. J. R. Stat. Soc. Ser. B. 2007;69(5):741–796. [Google Scholar]
  25. Richtmeyer D., Morton K. Wiley; New York: 1967. Difference Methods for Initial Value Problems. [Google Scholar]
  26. Sala Marzio, Heroux Michael A., Day David M. Technical Report SAND2004-2189. Sandia National Laboratories; 2004. Trilinos Tutorial. [Google Scholar]
  27. Shampine L., Reichelt M. The MATLAB ODE Suite. SIAM J. Sci. Comput. 1997;18:1–22. [Google Scholar]
  28. Stemmler M., Sengupta B., Laughlin S.B., Niven J.E. Advances in Neural Information Processing Systems. 2012. Energetically Optimal Action Potentials; pp. 1566–1574. [Google Scholar]
  29. Vyshemirsky V., Girolami M. Bayesian ranking of biochemical system models. Bioinformatics. 2008;24(6):833–839. doi: 10.1093/bioinformatics/btm607. [DOI] [PubMed] [Google Scholar]
  30. Wang Q. Forward and adjoint sensitivity computation of chaotic dynamical systems. J. Comput. Phys. 2013;235:1–13. [Google Scholar]
  31. Wiener Norbert. The homogeneous chaos. Am. J. Math. 1938;60(4):897–936. (ISSN 00029327) [Google Scholar]
  32. Wiener Norbert. The MIT Press; 1964. Extrapolation, Interpolation, and Smoothing of Stationary Time Series. [Google Scholar]
  33. Wilson H. Oxford University Press; New York: 1999. Spikes, Decisions and Actions: The Dynamical Foundations of Neuroscience. [Google Scholar]

RESOURCES