Abstract
Identifying a coupled dynamical system out of many plausible candidates, each of which could serve as the underlying generator of some observed measurements, is a profoundly ill-posed problem that commonly arises when modelling real-world phenomena. In this review, we detail a set of statistical procedures for inferring the structure of nonlinear coupled dynamical systems (structure learning), which has proved useful in neuroscience research. A key focus here is the comparison of competing models of network architectures—and implicit coupling functions—in terms of their Bayesian model evidence. These methods are collectively referred to as dynamic causal modelling. We focus on a relatively new approach that is proving remarkably useful, namely Bayesian model reduction, which enables rapid evaluation and comparison of models that differ in their network architecture. We illustrate the usefulness of these techniques through modelling neurovascular coupling (cellular pathways linking neuronal and vascular systems), whose function is an active focus of research in neurobiology and the imaging of coupled neuronal systems.
This article is part of the theme issue ‘Coupling functions: dynamical interaction mechanisms in the physical, biological and social sciences'.
Keywords: dynamic causal modelling, Bayesian model selection, Bayesian model reduction
1. Introduction
This paper sets out a general method for addressing the problem of structure learning, namely identifying a coupled dynamical system that best accounts for empirical observations [1,2]. In this context, a hypothesis about the structure of a system, for example the connectivity architecture of a neural network, is expressed formally as a model [3]. The objective is to search over models (e.g. by pruning redundant connections) to arrive at a network architecture that optimally explains the data. This approach rests upon dynamic causal modelling (DCM) [4], Bayesian model selection (BMS) [5] and Bayesian model reduction (BMR) [6], which are implemented in the freely available software called Statistical Parametric Mapping (SPM) [7] (see table 1 for further description of terminology used in this paper). These methods have been developed for the analysis of large-scale recordings of brain activity; however, they could be conveniently applied in other domains, where mechanistic modelling based on empirical data is of interest.
Table 1.
Glossary of terms that are used in the paper and their description.
Term | general description |
---|---|
Bayesian belief updating | Bayesian belief updating is the updating of probabilities as new data are acquired. An example is found in the context of inverting non-stationary EEG time series that undergoes transition into and out of paroxysmal activities [8]. In this case, EEG data are first divided into several locally (quasi-) stationary segments. Then, DCM is performed on the first segment to infer posterior parameter estimates. The ensuing estimate becomes the prior for the subsequent segment and so on. This is Bayesian belief updating over time. A trajectory of parameters over segments can then be constructed to characterize the dynamics of the non-stationary data. |
Bayesian fusion and multi-modal data integration | Bayesian fusion makes combined inferences about a (physical) system based on different kinds of observation, in order to model the system's dynamics. For instance, Bayesian fusion of models fitted to fMRI and M/EEG data can be used to understand the function of neurovascular coupling. The spatial specificity of fMRI is employed to localize active neuronal sources. This spatial information is used as a prior for source localization within a model of M/EEG data, which is employed to estimate neuronal parameters. The estimated neuronal parameters are taken as priors for the inversion of haemodynamic responses (i.e. Bayesian fusion) from fMRI time series. Therefore, combining these modalities provides a better understanding of neurovascular coupling than could be derived from either modality independently. |
Bayesian model comparison | Quantification of the relative evidence for different models of the same data. Typically, this is expressed as a Bayes factor, which is the ratio of the model evidence (marginal likelihood) for each model relative to a selected comparison model. |
Bayesian model reduction (BMR) | A statistical method for rapidly computing the posterior probability density over parameters and the log model evidence for a reduced model, given the parameters and log model evidence of a full model. Here, reduced and full models differ only in their priors. Under Gaussian assumptions, this has an analytic form. |
Bayesian model selection | The selection of one or more models with the highest evidence from a set of candidate models following Bayesian model comparison. |
canonical microcircuit (CMC) | The CMC is a biologically informed microcircuit model of laminar connectivity in the cortical column [9,10]. It comprises four populations whose activity can replicate a realistic pattern of M/EEG signals. Each population generates postsynaptic potentials (modelled by second-order differential equations) induced by presynaptic firing rates from external sources (interregional or distal populations, and/or exogenous inputs). These postsynaptic potentials generate presynaptic firing rates (via a sigmoid transformation), which in turn excite or inhibit other populations. |
compressive sensing based dynamical system identification [11] | Compressive sensing is a well-established signal processing method for reconstructing sparse signals from data. Recently, it has been applied for identification of fully observed dynamical systems (i.e. all states of the system are measured) with the assumption that the evolution of states can be modelled using (linear in parameters) power series and that the underling structure (coupling between nodes) of the system has a sparse form. Given these assumptions, it can be shown that the relation between measured time series and model states can be formulated as a linear function, parametrized by unknown parameters. Compressive sensing is then applied to find the sparse parameter vector that best explains the data. |
cortical column | the human cortex can be approximately divided into cylinders of diameter 500 µm where neurons within each cylindrical column activate in response to a particular stimulus. |
data assimilation [12] and model identification [13] | data assimilation is a term that was coined by meteorologists in the mid-twentieth century. It uses (and combines) a wide range of mathematical methods such as non-parametric statistical models (auto regressive models), nonlinear Kalman filters, statistical interpolation, nonlinear time series analysis and nonlinear system identification to establish models useful for weather prediction [14]. Model identification is a more general term that accounts for constructing models describing a phenomenon, which includes model construction and parameter estimation. |
data feature | in general, quantities that are useful for distinguishing/discriminating different regimes of a system are referred to as data features. In DCM, conventionally, models are either fitted to the raw data directly, or to data features such as autoregressive coefficients that model the power spectrum, power spectral densities (from time-frequency analyses) and principal components/modes and the phase of data. Extracting particular features of the data for modelling is known as feature selection, and is typically conducted as an initial step in the modelling pipeline. |
effective connectivity | effective connectivity can be understood in relation to functional and structural connectivity [15]. Functional connectivity is defined as a statistical dependency (such as correlation or transfer entropy) between multi-channel brain data (e.g. fMRI, M/EEG). Structural connectivity refers to the strength of anatomical connections between brain regions and can be estimated, for instance, using diffusion tensor imaging. Effective connectivity is the directed effect of neural populations on each other, under a particular experimental setting or task. Inferring effective connectivity typically requires combining brain imaging data with a biologically informed model of brain activity. This enables one to elucidate (i) underlying biological generators of the data (ii) how different experimental inputs (or conditions) alter the effective connectivity. |
electroencephalography (EEG) and magnetoencephalography (MEG) | M/EEG are non-invasive neuroimaging techniques that capture dynamics of neuronal activity with millisecond temporal resolution (on the same order as the temporal dynamics of synaptic activity). EEG captures ensemble neuronal membrane potential voltages using grids of electrodes placed on the scalp, whereas MEG measures accompanying fluctuations in magnetic fields that can be captured using arrays of magnetometers (known as superconducting quantum interference devices). The MEG signal is subject to less distortion by the skull and scalp than EEG. |
functional magnetic resonance imaging | fMRI is a non-invasive neuroimaging technique that measures changes of blood flow and oxygenation with a fine spatial resolution (effectively up to 0.5 mm) due to neuronal activation and the neurons' subsequent consumption of oxygen. fMRI measures the blood-oxygen-level-dependent (BOLD) response to brain activity. Changes in the measurement at each location (voxel) forms a time series, which is analysed using the analytic techniques reviewed here. |
haemodynamic response | this describes the process by which neuronal activation causes changes in blood flow, blood vessel volume and the dynamics of deoxyhaemoglobin. It can be modelled by a dynamical system known as the extended balloon model [16]. |
ill-posed problem and model identification in DCM | the identification of biological mechanisms generating measurable brain data is, in general, an ill-posed problem. This is predominately because (i) for most brain imaging techniques only partial, indirect, noisy and nonlinearly mixed data are available (e.g. EEG is mostly generated by the activity of a small proportion of pyramidal populations and has poor spatial resolution, fMRI is an indirect measure of neuronal activity with poor temporal resolution due to the smoothing effect of haemodynamics) and (ii) given the complexity of biological models, there are typically many parameters giving rise to similar data. With DCM, the ill-posed problem is addressed by setting suitable prior constraints on model parameters. |
inhibitory, excitatory and pyramidal neuronal population [17] | intuitively, postsynaptic potentials generated by inhibitory interneurons reduce the depolarization and subsequent activity of target populations. Conversely, excitatory interneurons increase the activity of target populations. Pyramidal cells are excitatory and can be found in superficial and deep layers of the cortical column. |
Kullback–Leibler divergence | this is a measure of the difference between two distributions. However, it is not a metric since it is not commutative [18]. As explained in this paper, it can be used to score the complexity of a model (the difference between posterior and prior probability densities over model parameters). |
log likelihood | the log probability of the observed data given the model, under a particular set of parameters [19,20]. In this paper, it serves as the accuracy of a model in terms of the probability of producing the observed data features. Integrating over the unknown model parameters gives the model evidence (or marginal likelihood). |
neurovascular coupling and neural drive function | neurovascular coupling refers to physiological pathways that enable communication between neurons and blood vessels [21]. A neuronal drive function is the scaled sum of neuronal activity, which is estimated using DCM for electrophysiological (e.g. EEG/MEG) recordings, and forms the input to a model of the neurovascular system [22]. |
model architecture | model architecture—in this paper—refers to a dynamical system distributed on a graph, consisting of nodes (e.g. neuronal populations) and edges; i.e. coupling or connections between the nodes. The dynamics of each node are governed by differential equations. In DCM, connections are set as being present (informed by the data) or absent (fixed at zero) by specifying the variance of Gaussian prior probability densities. |
statistical parametric mapping [7] | SPM is freely available software for analysing brain imaging data such as fMRI, MEG and EEG. It includes statistical routines (e.g. general linear model, random field theory, variational Bayes, voxel-based morphometry, statistical hypothesis testing, statistical signal processing, to name but a few). SPM also refers to a method for producing maps of parametric statistics, to test for distributed neural responses over the brain. The SPM software package also includes the dynamic causal modelling (DCM) toolbox, which enables the modelling of the underlying biological generators of neuroimaging data. |
time-evolving dynamical Bayesian inference [23] | a form of sequential Bayesian inference, developed to infer time-dependent coupling of noise-driven weakly coupled dynamical systems (e.g. coupled limit-cycle, phase and chaotic oscillators). Given successive segments (in time) of data, the algorithm maximizes the likelihood (and posterior probability density over unknown parameters) in the first segment. Then, a corrected form of posterior estimate of the parameters is considered as the prior of parameters in the next segment and so on. With this method, belief updating from one segment to the next is accompanied by applying a diagonal (correction) matrix to the covariance estimate of the parameters, preventing correlations among parameters propagating over time. |
tracking-based parameter identification in dynamical systems [24] | constant parameters in a dynamical system have zero temporal evolution. Therefore, they can be considered as state variables with trivial dynamics. The trivial dynamics of parameters can be absorbed into a state space model by augmenting the original equations of the system (thereby named an augmented dynamical system). A filtering method (e.g. nonlinear Kalman filter) can then be applied to reconstruct (estimate) the dynamics the augmented system that includes slowly fluctuating parameters. The mean and variance of the resulting parameter estimates can be taken as posterior estimates of constant parameters of the original dynamical system. |
variational Laplace | variational Bayes under the Laplace assumption. A scheme for approximate Bayesian inference in which the priors and approximate posteriors are (multivariate) normal. In contrast to most variational schemes, variational Laplace uses a gradient ascent on variational free energy. This is important because it eschews the need for analytic solutions and the use of conjugate priors. This means that model inversion uses exactly the same optimization scheme for any given generative model. |
The basic idea behind DCM is to convert data assimilation, identification and structure learning problems into a generic inference problem—and then use variational Bayesian techniques to infer all the unknowns, ranging from unobservable or latent states through to the structure or form of the dynamical system that best accounts for the data at hand. In other words, DCM enables qualitative and quantitative questions to be asked about the dynamical system generating data (usually time series).
Formally, DCM is the Bayesian inversion of a biophysically informed dynamical (i.e. state space) model, given some time series data, usually neuroimaging data (magneto/electroencephalography (M/EEG), functional magnetic resonance imaging (fMRI)). The models that underwrite DCM are generally specified in terms of (ordinary, stochastic or delay) differential equations describing the coupling within and among dynamical systems. Hypotheses about the architecture of (directed) coupling are tested by comparing the evidence for the data under different models [4]. DCM can be employed to infer coupling (within each node and between nodes) of a nonlinear dynamical system, given empirical responses of a system (i.e. the brain) to experimental inputs or different dynamical regimes. This proceeds by inferring condition-specific parameters, which explain/model how changes in experimental context (or different dynamical regimes) are mediated by changes in particular connections in the model. This ability to model context-sensitive changes in coupling distinguishes DCM from other data assimilation and identification methods (e.g. parameter tracking [24], compressive sensing based dynamical system identification [11] and time-evolving dynamical Bayesian inference of coupled dynamical systems [23], to name a few). The outcomes of model inversion using DCM are a posterior probability density over model parameters (that parametrize coupling and context-sensitive effects) and the relative probability of having observed the data under each model (model evidence). This model evidence or marginal likelihood can be used to draw conclusions by comparing the evidence for different models—known as Bayesian model comparison and selection (BMS).
In the past two decades, DCM for fMRI has been applied in many studies in the field of cognitive neuroscience (e.g. ageing [25], memory [26]) as well as psychiatric disorders [8,27,28]. In parallel, using the same conceptual principles, DCM has also been applied to Magneto/Electroencephalography (M/EEG) data, to disambiguate neuronal causes of electromagnetic responses such as induced responses [29], phase coupling [9], event-related potentials [30,31] and also to provide insights into underlying generators of neurological disorders [32–34]. More recently, DCM has motivated and contributed to the development of research in theoretical neuroscience such as predictive coding [35], active inference [36] and, interestingly, the Bayesian brain hypothesis, which aims to establish the mathematical foundations of how the brain interacts with—and understands—its environment [19].
In this paper, we review and illustrate recent developments in Bayesian inference that enable an efficient procedure for learning the structure of coupled dynamical systems. First, we present the theoretical foundations of structure learning—using DCM—in a general form that may have wider application in engineering, physics and mathematical biology. We then present an example that highlights the usefulness of structure learning in the field of neuroscience. All software developments relating to the results in this paper are freely available through the academic SPM software (https://www.fil.ion.ucl.ac.uk/spm/). A glossary of technical terms used in this paper is provided in table 1.
2. Dynamic causal modelling and structure learning
The pipeline for studying the underlying generators of neuroimaging data using DCM is shown in figure 1. The procedure begins by designing and conducting an experiment to study some particular function of the brain. Data features are then selected from the measured data and one or more hypotheses are formally expressed as (biophysically informed) coupled dynamical systems. A Bayesian (variational Laplace) scheme is then used to infer the settings of the parameters for each model (e.g. coupling strengths) and to quantify each model's evidence. Structure learning is then performed to compare the alternative model architectures, using BMS and BMR, in order to identify the best explanation for the underlying generators of the data. In the following sections, we describe each of these steps in turn, before turning to a worked example.
Figure 1.
Structure learning using dynamic causal modelling. (a) The pipeline begins by designing an experiment to study a functional aspect of the brain, (b) followed by recording brain activity using neuroimaging devices, such as MEG or fMRI. (c) Feature selection is performed on the neuroimaging data, for example by calculating evoked responses by averaging over trials, or transforming neuroimaging data to the frequency domain. (d) Having prepared the data, the experimenter postulates several hypotheses (specified as models) about the underlying generation of the neuroimaging data. These can be expressed in terms of connections between and within brain regions (effectivity connectivity). (e) These models are fitted to the data by finding the setting of the parameters that optimize variational free energy (F). (f) The evidence associated with each model is compared using BMS and/or reduction, to identify the most likely structure that accounts for the data. Image credits: MRI scanner by Grant Fisher, TN, screen in (a) and (e) by Dinosoft Labs, PK, all from the Noun Project, CC BY 3.0. (Online version in colour.)
(a). Experimental design and neuroimaging
A DCM study begins by carefully articulating hypotheses and designing an experiment to test them. Typically, to maximize efficiency, there will be two or more independent experimental manipulations at the within-subject level, forming a factorial design. There may be additional experimental factors at the between-subject level, for instance to investigate differences between patients and controls, which will inform the strategy for sampling participants from the wider population. The hypotheses determine the choice of neuroimaging modality (e.g. M/EEG, fMRI), as well as the data features that will be selected and the type of dynamic causal model that will be used.
(b). Feature selection
The next step is to select features in the collected data that are important (i.e. informative) from a modelling standpoint. This is known as feature selection or extraction. For example, averaging time series over trials in response to stimulation gives event-related potentials (ERPs) [37], or neuroimaging data can be represented in the frequency [38,39] or time-frequency domain [29].
(c). Model specification
The hypotheses are then formally expressed in terms of k distinct and biologically informed (state space) model architectures M = {m1, … ,mk}, each describing possible interactions between experimental inputs and neuronal dynamics. In effect, a model in DCM can be understood as a dynamical system distributed on a graph, where the neuroimaging data capture the activity of each node (either directly—as in fMRI or indirectly via some mapping to sensor space—as in EEG). Depending on the scale and fidelity of the neuroimaging measurement, each node could, in principle, correspond to a compartment of a neuron, or to an individual neuron, or (more typically) to the average activity of millions of neurons constituting a neuronal population or a brain region. At any given scale, connections between nodes in this graph are referred to as the effective connectivity, namely the effect of one node on another.
In general, partially observed discrete-continuous dynamical systems (which commonly arise in many mathematical and engineering applications) are well suited for modelling neural interactions [40]. The generative (state space) model in DCM can be written as follows [20]:
2.1 |
The first line in equation (2.1) governs dynamics of interactions (coupling) within and between nodes of the coupled dynamical system, where z are (usually unobservable or hidden biological) states with a flow that depends upon parameters θ(f) and exogenous or experimental input U. When the exogenous inputs are random fluctuations or innovations, equation (2.1) becomes a stochastic differential equation. The choice of coupling function f (parametrized in terms of extrinsic and intrinsic coupling) is usually motivated by biological principles (e.g. [17,41]). The second line in equation (2.1) is known as the observer function, and links (usually observable neuroimaging) data Y to the hidden or latent variables (e.g. [10,16]). In the second line of equation (2.1), the function g models the contribution of the hidden states (depending upon parameters θ(g)) to the data. The second term in the observer equation, c(β0), models confounding signal components (e.g. drift) where c(.) is typically a general linear model with parameters β0 (e.g. mean of signal) [20,42]. In the observer model, ε denotes the measurement error, which is conventionally modelled by a zero mean identically independent process (I.I.D), the covariance of which is estimated from the data. Hereinafter, unknown parameters in equation (2.1) are denoted by θ, which includes model parameters θ(f), observation function parameters, θ(g) and parameters λ that model the covariance of the observation noise (detailed in the supplementary material). The dynamics of each node are governed by (a set of) differential equations and the nodes are connected to each other. In effect, DCM estimates the parameters associated with the dynamics (i.e. differential equations) of each node and the coupling between them. There are many forms of models that have been used in DCM, for example biophysical haemodynamic models [4,16,43–45], neuronal mass [17,31,46] and field models [47], weakly coupled oscillators [9], etc.
(d). Model identification
Given a prior probability density over the unknown parameters, p(θ), initial states, z(0) and neuroimaging data Y, DCM is used to infer the posterior probability of the parameters of each model, using a gradient ascent on variational free energy, F, as its cost function (see the electronic supplementary material for further information). The free energy, also known as the evidence lower bound in machine learning, is a lower bound on the log model evidence (or marginal likelihood) p(Y|m) (m denotes the model). In general, log model evidence is an unknown value, which can be decomposed as follows [20,48]:
2.2 |
where Dkl(q(θ), p(θ|Y)) is the Kullback–Leibler (KL) divergence between the approximate and true posterior over the parameters, which are denoted by q(θ) and p(θ|Y), respectively. F is (variational) free energy, defined as the difference between the accuracy of the model (i.e. expected log likelihood of the model, Eq(θ)[ln p(Y|m)]) and the complexity of the model (i.e. KL divergence between the prior, p(θ|m) and approximated posterior, q(θ), over the parameters, D(q(θ), p(θ|m)) [20,49]:
2.3 |
Given that the log model evidence—the left-hand side of equation (2.2)—is fixed (although unknown), by maximizing the free energy we minimize the divergence between the approximate and true posterior. The free energy F scores the goodness of a hypothesis (i.e. model), which can be employed for model comparison and for inferring model parameters—where the approximate posterior density q(θ) quantifies beliefs about the parameters. To identify the setting of the parameters that maximizes free energy, DCM uses an estimation scheme known as variational Laplace, namely variational Bayes under the Laplace approximation (i.e. the prior and posterior densities of latent variables have Gaussian distributions). Intuitively, maximizing the free energy offers a form of regularization, because the KL divergence term (complexity) in equation (2.3) acts as a penalty. Therefore, as the number of parameters increases—or the (approximate) posterior over parameters deviates from the prior—the free energy decreases. This finesses the risk of overfitting or implausible parameter estimates. The KL term in equation (2.3) accounts for both the expected values of the parameters and their covariance, providing a closer approximation of the log evidence than the Akaike information criterion or Bayesian information criterion, which define complexity as (effectively) the number of parameters, without considering their posterior covariance [50].
To summarize, model identification in DCM is carried out by gradient ascent of a variational free energy functional under the Laplace approximation (i.e. both prior and posterior densities over parameters have a Gaussian form) to identify the parameters of dynamical systems distributed on a graph. Usually, the form of the (ordinary, partial, stochastic or delay) differential equations and priors over their parameters are chosen to embody biophysical constraints. These constraints usually guarantee a degree of dynamical stability, which is important during model inversion, particularly when using approximate Bayesian inference. This is because brittle or highly nonlinear systems violate the Laplace assumption inherent in the inversion schemes commonly used in DCM. Generally, the dynamics on each node have simple point attractors (associated with low-order Taylor expansions of nonlinear differential equations) [51]. Having said this, it is possible to consider autonomous dynamics on each node using phase reduction techniques. For example, DCM for phase coupling uses hidden neuronal states that are the phase of weakly coupled oscillators, with implicit (quasi-) periodic dynamics [9,52].
The scheme described above provides a systematic way of handling the many scenarios where model parameters change over time. The simplest example would be the condition-specific effects modelled by bilinear coupled dynamical systems [44] which were first introduced in the context DCM for fMRI [4]. With time-varying inputs, bilinear effects entail that the coupling (effective connectivity) is itself a function of time and therefore context-sensitive. Another important example arises in the case of modelling epileptic seizures, where a slow drift of physiological parameters might give rise to changes of brain states, from apparently normal to pathological conditions (e.g. via phase transitions and bifurcations) [34]. In this case, hierarchical linear regression of the parameters, estimated over successive windows of data, can be employed to identify coupling that varies with specified experimental variables [53]. The first level of the hierarchical model corresponds to the coupling estimated within each time window. The second level of the model uses the ensuing coupling (from the first level) to model changes (here fluctuations) of the parameters across windows. This approach has proven useful in the analysis of longitudinal resting state fMRI data in human neurosurgery patients [54] and for modelling seizure activity in rodent [55] and zebrafish [34] models.
These model identification methods facilitate testing hypotheses in an experimental setting. They also enable the validation of new models using the DCM framework. Conventionally, models in DCM are validated in three ways: (i) face validity, where simulated data is used to ensure known model parameters (and structures) can be recovered following model inversion, (ii) construct validity, where the inferences from DCM are compared with other (comparable) approaches, and (iii) predictive validity, which tests whether the posterior predictions of DCM reflect a known or expected effect, e.g. a pharmacological effect or diagnostic group membership.
(e). Structure learning
Having inverted a model to obtain its evidence and parameters, the next step is to ask whether the structure of the model could be simplified to further optimize the variational free energy. Recall that the free energy quantifies the trade-off between the accuracy and complexity of the model—so if a change to the network structure increases free energy, then the model has become more accurate (better fitting the data) and/or less complex (simpler in terms of its parametrization). This process of selecting between network architectures (a.k.a. structure learning) depends on BMS, namely the selection among different models based on their evidence. This process can be performed automatically and rapidly over potentially thousands of alternative models, using an approach called Bayesian model reduction.
The hypotheses or models m1, m2, … ,mk with free energy F1, F2, … ,Fk, can be compared using Bayesian model comparison. For any two inverted models, mi and mj with free energies of Fi and Fj, respectively, the log Bayes factor, log Bij is defined as follows [5,56]:
2.4 |
By estimating the parameters of each model (thereby maximizing the free energy), the KL divergence between the approximate and true posterior of the parameters vanishes in the limit [49,50]. Therefore, the log Bayes factor can be approximated as the difference between free energies of the models. Conventionally, a log Bayes factor above three indicates that there is ‘strong evidence’ for model mi over model mj [5,56]. Bayesian model comparison thereby allows pairwise model comparison, which in turn can be used to identify which of two models best accounts for the data. The posterior probability for each model can then be computed by application of Bayes rule. Under equal priors for each model, this simplifies to a logistic (sigmoid) function of the log Bayes factor:
2.5 |
This process is easily generalized to comparisons with more than two models, by computing the log Bayes factor of each model relative to any one of the models in the comparison set. In some studies, rather than one model corresponding to one hypothesis, it can make more sense for a set or family of models to represent a particular hypothesis. This generalization of BMS—where the model space is grouped in several classes or families—is referred to as family wise model comparison [5]. To explain this, let model space M = {m1,m2, … .mk} be grouped into r, disjoint model spaces , where . In this case, the prior probability of each class is 1/r. Consequently, the prior probability that a model j belongs to the class Mi (with cardinal l) is . By applying Bayes' rule over the space of all models, one can calculate the posterior probability of model mj with evidence p(Y|mj), as follows:
2.6 |
By definition, the posterior probability for each family (i.e. class) is the sum of the posterior probabilities of its constituent models. The ensuing family posterior probabilities can then be compared using Bayesian model comparison. One ubiquitous application of family comparison is to compare models with and without a common feature or property—for example, with or without a particular parameter. In this case, BMS can be performed on the model space comprising two groups of models, where the properties of interest appear in only one group. This can be used to establish whether such a property is necessary to explain the data at hand.
By using the statistics detailed above, one can assign a probability to each of a predefined set of hypotheses about the structure of the neural network and draw conclusions based upon a small number of plausible architectures. An alternative approach to structure learning is to apply Bayesian model comparison in what can be called a discovery mode. By automatically searching over potentially thousands of alternative model architectures, one can ask whether eliminating any sub-structure (i.e. subset of parameters) of the model would increase the free energy relative to the original (full) model. This is made possible by a recent development known as Bayesian model reduction (BMR) [6,57]. Assume a (full) model is fitted to data Y with prior p(θ) and posterior p(θ|Y) with the parameter vector θ. Using Bayes’ rule (we have dropped the dependency on the model m for clarity):
2.7 |
Where the model evidence is , the log of which is approximated by the variational free energy. Crucially, the priors determine which parameters (e.g. connections) should be informed by the data. Having estimated the parameters and free energy of the model, which we will refer to as the ‘full’ model, BMR provides an analytic and rapid technique for evaluating the relative evidence for an alternative model, which differs only in terms of the priors . Typically, this alternative set of priors will fix certain parameters to their prior mean, thereby reducing or pruning the model structure. For this reduced model, the approximate posterior of the parameters under the reduced priors is again given by the Bayes rule:
2.8 |
The likelihood function p(Y|θ) for the reduced and full models is the same, which enables equations (2.7) and (2.8) to be linked as follows:
2.9 |
Next, to find the evidence of the reduced model, both sides of equation (2.9) are integrated over the parameter space. Using the fact that , the model evidence for the reduced model is as follows:
2.10 |
Equations (2.9) and (2.10) have analytic solutions given Gaussian forms for prior and posterior densities, meaning that the coupling parameters and evidence for reduced models can be derived from those of the full model in milliseconds, on a typical desktop computer [58]. This speed is leveraged in DCM for automatically discovering an optimal coupling structure for a network. In this setting, a greedy search is used, which iteratively generates candidate reduced models with different priors [59]. Their evidence is evaluated using BMR, and Bayesian model comparison is used to assess whether they should be retained or discarded. The ensuing coupling parameters from the best candidate models are averaged (using Bayesian model averaging based upon the evidence for each model) and returned as the optimal network structure for those data.
It is worth mentioning that, in our experience, a greedy search over reduced models performs well, and gives similar results to manually defined sets of reduced models [59]. Nevertheless, other search/optimization algorithms could be considered and their performance compared. The use of a greedy search is purely for computational expediency, in situations where the number of combinations of different parameters—that constitute distinct models—becomes prohibitive. When there are a reasonable number of models (e.g. in the hundreds), the model comparison (i.e. or reduction) can use an exhaustive search.
Bayesian model comparison may be contrasted against another method in the modelling literature—surrogate data testing, where the statistics of different data are compared (for instance, the statistics of empirical time series are compared against a null set of time series, generated by a process that lacks a particular parameter) [60]. However, Bayesian model comparison operates at the level of models rather than summary statistics of the data, to provide a straightforward and efficient method for comparing models (without the need for sampling, if variational methods are used). Bayesian model comparison properly accommodates both model accuracy and complexity, meaning that any conclusions one draws will adhere to Occam's principle, i.e. the simplest explanation for the data should be favoured.
(f). Drawing conclusions
Together, the procedures described above constitute all the necessary tools for learning the optimal structure of a coupled dynamical system—as evinced by some data. We started by defining the free energy, a scalar functional that quantifies the trade-off between the accuracy and complexity of a model. This quantity is estimated in the context of a ‘full’ model with all coupling parameters of interest. Then, using BMR, the free energy for reduced models are derived analytically—either based on a few specified models—or by performing an automatic greedy search over potentially thousands of reduced models. Bayesian model comparison is used in both cases, to evaluate which model(s) should be favoured as explanations for the underlying generators of the data.
This concludes our overview of the basic approach to coupled dynamical systems. In what follows, we provide two worked examples to illustrate the sorts of inference and structure learning that is afforded. Although this example reflects our interest (i.e. physiological coupling in the brain) the analyses can, in principle, be applied to any coupled dynamical system that can be articulated in terms of (stochastic, ordinary or delay) differential equations.
3. Worked examples
To illustrate the theory reviewed in this paper, we consider modelling the neurovascular coupling system, which ensures that brain cells (neurons) are adequately perfused with oxygenated blood in an activity-dependent fashion. This illustrates dynamical coupling on two levels: first, the coupling between two different physiological systems (neuronal and haemodynamic systems). Second, the coupling among remote neuronal populations that underwrites distributed neuronal processing and representations. Understanding the functional architecture of neurovascular coupling facilitates our understanding of ischaemic brain injury (i.e. stroke) and, in research, establishes the origin of fMRI time series used in brain mapping. In short, neurovascular coupling determines how blood dynamics are altered due to neuronal demands for energy [42,61]. The challenge in this field is measuring the activity of this system non-invasively in the human brain, and therefore modelling-based approaches have been widely applied.
In the following example, a coupled (biophysically informed) dynamical system that models the behaviour of neuronal responses and the vascular system was employed. This is illustrated for a single brain region (one node of the coupled system) illustrated in figure 2. In this model, a canonical microcircuit (CMC) accounts for laminar-specific connections in a small area of cortex known as a cortical column. Dynamic causal models at this level of (neuronal) detail usually requires fast electrophysiological measurements such as M/EEG [9]. Blood vessel dynamics caused by fast neuronal activity are captured by a haemodynamic model, the parameters of which are generally inferred using functional MRI [16]. The intermediate system linking (fast) neuronal and (slow) haemodynamics is known as neurovascular coupling, which is the focus of the modelling work described below.
Figure 2.
Macroscale model of neuronal to vascular pathway dynamics in a single brain region. This model [10] couples neuronal electrical activity (canonical micro circuit, CMC) with neurovascular units (e.g. astrocytes), which regulate/alter changes in blood flow and deoxygenation (the haemodynamic response). The CMC model comprises four populations: (1) spiny stellate cells, (2) superficial pyramidal cells, (3) inhibitory interneurons and (4) deep pyramidal cells. Each population is coupled to other populations via excitatory or inhibitory connections. In addition, a self-inhibitory connection—illustrated as a circular arrow—exists for each population (including excitatory populations) that models the gain or sensitivity of each population to its inputs, which is an inherent property of the dynamics of neuronal populations. EEG or MEG capture neuronal responses (modelled by interconnected CMC models), which may be distorted by the scalp—and are typically mixed with the activity of other nodes or sources. The fMRI signal derives from the haemodynamic part of the model. Changes in activity of neuronal populations (pre- or postsynaptic potentials) excite neurovascular coupling that in turn causes (e.g. by the release of nitric oxide) changes in blood flow. This is accompanied by changes in blood volume and a reduction in the level of deoxyhaemoglobin (dHb) in blood vessels, which give rise to the blood-oxygen-level-dependent (BOLD) response, measured using fMRI. Image credits: MRI scanner by Grant Fisher, TN from the Noun Project, CC BY 3.0. (Online version in colour.)
It is worth emphasizing that the kind of model inversion problem illustrated here is extremely challenging, which is why it has driven multiple technical and theoretical developments. The single region model presented here comprises 12 unobserved biological states that operate on different time scales (the time resolution of the neuronal system and vascular dynamics are milliseconds and seconds, respectively) with unknown coupling, as well as unknown parameters that link neuronal and vascular parameters to the observed fMRI and EEG recordings. The Bayesian methods reviewed here provide a useful basis for addressing profoundly ill-posed problems in structure learning of coupled dynamical systems.
4. Worked example 1: Bayesian model reduction for structure learning
The aim of the first example is to showcase an application of BMR to modelling the neurovascular system using fMRI time series [21]. This analysis used an fMRI dataset from a previously conducted experiment investigating the neural response to attention to visual motion [22]. Three brain regions (i.e. nodes) were identified that showed significant experimental effects (visual areas V1, V5 and the frontal eye fields) and representative time series were extracted from each region (i.e. feature selection). To explain the underlying generators of these time series, a DCM was specified comprising three canonical microcircuits (modelling within-region connectivity) that were coupled by between-region connections, creating a hierarchy of distributed processing nodes. The (presynaptic) input signal to each neuronal population in the CMC model comprises three components: two signals that are received from excitatory and inhibitory populations, and a third that is the summation of extrinsic inputs from distal regions. A weighted sum of these three signals (per neuronal population) form the input to the haemodynamic model. The weights that are inferred using DCM and BMR are then used to identify the optimal reduced coupling structure—the minimal set of neurovascular parameters—that best explain the data.
The parameters of this model were estimated using the fMRI data (model identification), and those relating to neurovascular coupling are shown in figure 3a. Next, BMR was applied, using an automatic greedy search over reduced models, to ask whether there was any sub-structure in the neurovascular model parameter space that could that could equally well account for the data, relative to the full model. The optimal reduced model is shown in figure 3b, where inhibitory signals to three of the neuronal populations played a predominate role. From this single subject data, one could conclude that the origin of the blood-oxygen-level-dependent (BOLD) signal can be primarily explained by the contribution of inhibitory inputs to superficial pyramidal cells, deep pyramidal cells and spiny stellate cells. In other words, the origin of the BOLD response is linked to the activity of inhibitory populations. This result was a technical illustration and confirming it would require a group study with a representative sample of the population. Nevertheless, it demonstrates that DCM and BMR enable the investigation of architectural questions about the underlying generators of fMRI time series using non-invasive recordings. In particular, the use of BMR with an automatic search allowed a fast and efficient search over a large model space.
Figure 3.
Structure learning of neurovascular coupling using BMR. Panel (a) shows the posterior density over the neurovascular coupling parameters, modelled as the scaled sum of inhibitory (negative), excitatory (positive) and extrinsic signals to each population (i.e. SP, superficial pyramidal neuron; SS, spiny stellate cell; II, inhibitory interneuron; DP, deep pyramidal neuron). In panel (b), Bayesian model reduction (BMR) was performed using the posterior estimates of the parameters to identify a subset of parameters that best accounted for the data. The bar plots show the posterior expectations and 90% credible intervals of the neurovascular coupling parameters. Positive values indicate a positive contribution of a particular neuronal population to the overall vasodilatory signal, and negative values indicate a negative contribution of a neuronal population. The numbers associated with each population [1–4] correspond to the four neuronal populations in figure 2. For this single subject's data, inhibitory inputs to the superior and deep pyramidal populations had a negative influence on the haemodynamic response. Smaller positive influences on the haemodynamic response were due to inhibitory inputs to spiny stellate cells and extrinsic inputs to deep pyramidal cells. (Online version in colour.)
5. Worked example 2: Bayesian fusion
This second study focused on Bayesian fusion across neuroimaging modalities—MEG and fMRI. Data were collected using each modality, under the same cognitive task (an auditory oddball paradigm), to inform the neuronal and neurovascular/haemodynamic parts of the model, respectively [62]. In the previous example, the neuronal part of the model had various parameters fixed (e.g. synaptic time constants), to estimate the neurovascular parameters using only fMRI data. In this second study, the objective was to develop a modelling scheme that uses the high temporal resolution of MEG to infer the parameters of the neuronal part of the model, before using the fMRI data to infer parameters of the neurovascular/haemodynamic part. BMS was then used to infer the optimal reduced structure of the combined MEG–fMRI-informed model.
To enable flexible coupling of different neural and neurovascular models, this study introduced an interface between them called neuronal drive functions (figure 4). These play a crucial role in the following analysis. First, active brain regions are identified from the fMRI data using standard methods (i.e. SPM). The coordinates of activated regions are then used as spatial priors in the specification of the DCM for MEG model. By inverting this model, DCM for MEG identifies the generators of event-related potentials [31] in terms of functional architectures, namely condition-specific changes in intrinsic (within-region) and extrinsic (between-region) connectivity. Using the posterior expectations of the neuronal parameters, the canonical microcircuit DCM is then used to generate synaptic responses to each experimental condition (figure 4b)—the neural drive functions. Finally, these functions (aligned to the timing of stimuli in the fMRI experiment) are used to drive a haemodynamic model of BOLD responses (figure 4c) [62]. The contribution of each population to the BOLD response is parametrized (βn), where these parameters are estimated from the fMRI data. By specifying models with different parametrizations, this procedure enables comparison of different models of neurovascular coupling (BMS).
Figure 4.
Multi-modal DCM for MEG–fMRI pipeline. (a) The neural network is modelled with a canonical microcircuit (CMC) for each brain region (i.e. node). The parameters of the CMC model are estimated using DCM for MEG, with spatial priors determined by a mass-univariate regression analysis of the fMRI data. (b) Condition-specific neuronal response functions are simulated from the CMC model using the estimated parameters, associated with each population (P1, … ,P4) and each experimental condition. The top row shows (impulse) response functions from neuronal populations SS and SP, and the bottom row shows (impulse) response functions from populations II and DP, for a single experimental condition. (c) The inputs to the haemodynamic model are calculated by summing and temporally shifting (convolving) the neuronal response functions with the timeline of the experimental stimulation in the fMRI paradigm. The ensuing neuronal drives from each population are scaled by parameters β to form a neurovascular signal, s(t), which forms the input to the haemodynamic model. The haemodynamic model is fitted to the fMRI time series and hypotheses are tested by comparing the evidence for reduced models with different combinations of β parameters switched on or off. (Online version in colour.)
This study also provides an opportunity to illustrate the use of pre-defined model spaces. Whereas in the first example an automatic search was conducted over reduced models, here a set of carefully specified models were compared to test specific experimental questions or factors. The ensuing set of models—the model space—characterized neurovascular coupling in terms of four factors or questions. In brief, the factors were (i) whether there were presynaptic [44] or postsynaptic [61] contributions to the neurovascular signal, (ii) whether neuronal responses from distal regions excite neurovascular coupling [45], (iii) whether the neurovascular coupling function was the same across all regions or should be defined in a region-specific way [46], and finally (iv) whether a first- or second-order differential equation should be used for the dynamics of the neurovascular system to determine if there was any delay between neuronal activity and the ensuing BOLD response [47,48]. A total of 16 candidate models, with different combinations of these four factors, were evaluated using Bayesian model comparison. For each of the four experimental questions, the 16 models were grouped into families (e.g. all models with presynaptic versus postsynaptic input) and the probability for each family computed.
In this illustrative single subject study, the results of family wise BMS on each group of models identified strong evidence (with nearly 100% posterior confidence) for the following:
-
(i)
The BOLD effect was caused by presynaptic signals, which is in line with the findings of [44], which found mean neuronal firing rates (presynaptic signals) induced BOLD signals.
-
(ii)
Regional neuronal drives to haemodynamic responses induced vascular perfusion. This is consistent with general opinion that local neuronal activity induces BOLD contrast.
-
(iii)
The strength of neurovascular coupling was region-specific. This is in agreement with invasive recordings using animal studies suggesting that neurovascular coupling varies from brain region to region [46].
-
(iv)
The response of the BOLD response to neuronal activity was instantaneous, rather than delayed.
This example illustrates some of the intricacies of structure learning using DCM and BMS. First, hypotheses about biophysical processes can be expressed formally as models, and in particular as a factorial design (i.e. a model space that can be partitioned among different factors or attributes). Second, data from different neuroimaging modalities can be combined using Bayesian fusion, where different parts of the model are informed by different modalities. For flexibility and efficiency of model inversion, neuronal drive functions were introduced to act as a bridge between neural and haemodynamic models. Finally, each experimental question can be addressed through family wise Bayesian model comparison. In summary, these two example applications illustrate the application of DCM, BMR and BMS to structure learning based on multi-modal neuroimaging data.
6. Discussion
In this paper, we have reviewed a suite of recently developed methods for structure learning in the context of coupled dynamical systems. In particular, we showcased applications in neuroscience using dynamic causal modelling (DCM)—the Bayesian inversion of biophysically informed coupled dynamical systems, and Bayesian model selection (BMS) and reduction (BMR) for assessing the evidence for different models or hypotheses. To date, these tools have mainly been applied in the context of cognitive and clinical neuroscience—to unravel the functional architectures underlying neuroimaging data. DCM offers an efficient way to estimate the parameters of large-scale dynamical system based on a gradient ascent of a variational free energy functional. This functional inherently scores different candidate architectures and coupling functions in terms of a trade-off between accuracy and model complexity.
In a general setting, DCM, BMS and BMR offer efficient pipelines for modellers to identify coupled dynamic systems in an evidence-based fashion. DCM has been applied to a wide range of problems including parameter estimation for deterministic [63] and stochastic [43,64] dynamical systems using time, frequency or time-frequency domain information. In addition, DCM has been found to be useful for studying large networks, based on the centre manifold theorem [51] and parameter estimation of dynamical systems on a manifold [65]. These examples demonstrate that structure learning based on DCM, BMR and BMS provide a general and efficient method that can be applied to a wide range of modelling problems of real-word physical systems.
One might ask whether dynamic causal modelling has real-word clinical applications, beyond research. For instance, would it be possible to use DCM as a part of biological control system, to suppress or prevent unwanted activity in a diseased brain (e.g. epileptic seizures, or the symptoms of Parkinson's disease)? This question has been addressed in the setting of Parkinson's disease [66]; however, there is a long road ahead before these models are sufficiently well developed and validated to be used in the treatment of neurological disorders. In a research context, an avenue receiving much attention is how to validate theories of brain function based on predictive coding and more generally the Bayesian brain hypothesis—and, in particular, how to identify the mechanisms of information transfer between layers of cortex [18,19,36]. Here, tools such as the CMC model, variational Laplace and structure learning using BMS and BMR are likely to prove useful.
Supplementary Material
Data accessibility
Code demonstrating the methods described in this paper can be found in the freely available SPM software package—https://www.fil.ion.ucl.ac.uk/spm/. After installation, type DEM (short for Demo) and press enter, to view all available demos.
Authors' contributions
A.J. wrote the manuscript and prepared the figures. P.Z., V.L. and K.J.F. contributed to planning and editing the manuscript.
Competing interests
We declare we have no competing interests.
Funding
The Wellcome Centre for Human Neuroimaging is supported by core funding from Wellcome (grant no. 203147/Z/16/Z).
References
- 1.Bishop CM. 2006. Pattern recognition and machine learning, 738 p Berlin, Germany: Springer. [Google Scholar]
- 2.Beal MJ. 2003. Variational algorithms for approximate Bayesian inference. PhD thesis, University College London. [Google Scholar]
- 3.Savage LJ. 1972. The foundations of statistics, 310 p New York, NY: Dover Publications. [Google Scholar]
- 4.Friston KJ, Harrison L, Penny W. 2003. Dynamic causal modelling. Neuroimage 19, 1273–1302. ( 10.1016/S1053-8119(03)00202-7) [DOI] [PubMed] [Google Scholar]
- 5.Rosa MJ, Penny WD, Stephan KE, Schofield TM, Friston KJ, Leff AP. 2010. Comparing families of dynamic causal models. PLoS Comput. Biol. 6, e1000709 ( 10.1371/journal.pcbi.1000709) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Friston K, Penny W. 2011. Post hoc Bayesian model selection. Neuroimage 56, 2089–2099. ( 10.1016/j.neuroimage.2011.03.062) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Friston KJ, Karl J., Ashburner J, Kiebel S, Nichols T, Penny WD. 2007. Statistical parametric mapping: the analysis of funtional brain images. Amsterdam, The Netherlands: Elsevier. [Google Scholar]
- 8.Zaytseva Y, et al. 2018. Theoretical modeling of cognitive dysfunction in schizophrenia by means of errors and corresponding brain networks. Front. Psychol. 9, 1–22. ( 10.3389/fpsyg.2018.01027) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Duzel E, Fuentemilla L, Penny WD, Friston K, Litvak V. 2009. Dynamic Causal Models for phase coupling. J. Neurosci. Methods. 183, 19–30. ( 10.1016/j.jneumeth.2009.06.029) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Daunizeau J, Kiebel SJ, Friston KJ. 2009. Dynamic causal modelling of distributed electromagnetic responses. Neuroimage. 47, 590–601. ( 10.1016/j.neuroimage.2009.04.062) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Wang W-X, Lai Y-C, Grebogi C. 2016. Data based identification and prediction of nonlinear and complex dynamical systems. Phys. Rep. 644, 1–76. ( 10.1016/j.physrep.2016.06.004) [DOI] [Google Scholar]
- 12.Evensen G. 2009. Data assimilation : the ensemble kalman filter. Berlin, Germany: Springer. [Google Scholar]
- 13.Bittanti S. 2019. Model identification and data analysis. Hoboken, NJ: Wiley [Google Scholar]
- 14.Siek MBLA. 2011. Unesco-IHE institute for water education. Predicting storm surges: chaos, computational intelligence, data assimilation, ensembles. Boca Raton, FL: CRC Press. [Google Scholar]
- 15.Friston KJ. 2011. Functional and effective connectivity: a review. Brain Connect. 1, 13–36. ( 10.1089/brain.2011.0008) [DOI] [PubMed] [Google Scholar]
- 16.Stephan KE, Weiskopf N, Drysdale PM, Robinson PA, Friston KJ. 2007. Comparing hemodynamic models with DCM. Neuroimage 38, 387–401. ( 10.1016/j.neuroimage.2007.07.040) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.David O, Friston KJ. 2003. A neural mass model for MEG/EEG: coupling and neuronal dynamics. Neuroimage 20, 1743–1755. ( 10.1016/j.neuroimage.2003.07.015) [DOI] [PubMed] [Google Scholar]
- 18.Friston K. 2009. The free-energy principle: a rough guide to the brain? Trends Cogn. Sci. 13, 293–301. ( 10.1016/j.tics.2009.04.005) [DOI] [PubMed] [Google Scholar]
- 19.Friston KJ, Stephan KE. 2007. Free-energy and the brain. Synthese 159, 417–458. ( 10.1007/s11229-007-9237-y) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Friston KJ, Trujillo-Barreto N, Daunizeau J. 2008. DEM: a variational treatment of dynamic systems. Neuroimage 41, 849–885. ( 10.1016/j.neuroimage.2008.02.054) [DOI] [PubMed] [Google Scholar]
- 21.Friston KJ, Preller KH, Mathys C, Cagnan H, Heinzle J, Razi A Zeidman P. 2017. Dynamic causal modelling revisited. Neuroimage 199, 730–744. ( 10.1016/j.neuroimage.2017.02.045) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Büchel C, Friston K. 1997. Modulation of connectivity in visual pathways by attention: cortical interactions evaluated with structural equation modelling and fMRI. Cereb. Cortex 7, 768–778. ( 10.1093/cercor/7.8.768) [DOI] [PubMed] [Google Scholar]
- 23.Stankovski T, Duggento A, McClintock PVE, Stefanovska A. 2014. A tutorial on time-evolving dynamical Bayesian inference. Eur. Phys. J. Spec. Top. 223, 2685–2703. ( 10.1140/epjst/e2014-02286-7) [DOI] [Google Scholar]
- 24.Voss HU, Timmer J, Kurths J. 2004. Nonlinear dynamical system identification from uncertain and indirect measurements. Int. J. Bifurcation Chaos 14, 1905–1933. ( 10.1142/S0218127404010345) [DOI] [Google Scholar]
- 25.Dowlati E, Adams SE, Stiles AB, Moran RJ. 2016. Aging into perceptual control: a dynamic causal modeling for fMRI study of bistable perception. Front. Hum. Neurosci. 10, 1–12. ( 10.3389/fnhum.2016.00141) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Dima D, Jogia J, Frangou S. 2014. Dynamic causal modeling of load-dependent modulation of effective connectivity within the verbal working memory network. Hum. Brain Mapp. 35, 3025–3035. ( 10.1002/hbm.22382) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Heinzle J, Stephan KE. 2017. Dynamic causal modeling and its application to psychiatric disorders. In Computational psychiatry: mathematical modeling of mental illness (eds A Anticevic, JD Murray), pp.117–144. London, UK: Academic Press. [Google Scholar]
- 28.Razi A, Zou J, Zeidman P, Zhou Y, Friston KJ, Wang H, Wang H, Friston KJ. 2017. Altered intrinsic and extrinsic connectivity in schizophrenia. NeuroImage 17, 704–716. ( 10.1016/j.nicl.2017.12.006) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Chen CC, Kiebel SJ, Friston KJ. 2008. Dynamic causal modelling of induced responses. Neuroimage 41, 1293–1312. ( 10.1016/j.neuroimage.2008.03.026) [DOI] [PubMed] [Google Scholar]
- 30.Wu SC, Swindlehurst AL. 2013. Algorithms and bounds for dynamic causal modeling of brain connectivity. IEEE Trans. Signal Process. 61, 2990–3001. ( 10.1109/TSP.2013.2255040) [DOI] [Google Scholar]
- 31.Kilner JM, Harrison LM, Friston KJ, Kiebel SJ, Mattout J, David O. 2006. Dynamic causal modeling of evoked responses in EEG and MEG. Neuroimage 30, 1255–1272. ( 10.1016/j.neuroimage.2005.10.045) [DOI] [PubMed] [Google Scholar]
- 32.Papadopoulou M, et al. 2018. NMDA-receptor antibodies alter cortical microcircuit dynamics. Proc. Natl Acad. Sci. USA 115, E9916–E9925. ( 10.1073/pnas.1804846115) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Papadopoulou M, Leite M, van Mierlo P, Vonck K, Lemieux L, Friston K, Marinazzo D. 2015. Tracking slow modulations in synaptic gain using dynamic causal modelling: validation in epilepsy. Neuroimage 107, 117–126. ( 10.1016/j.neuroimage.2014.12.007) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Rosch RE, Hunter PR, Baldeweg T, Friston KJ, Meyer MP. 2018. Calcium imaging and dynamic causal modelling reveal brain-wide changes in effective connectivity and synaptic dynamics during epileptic seizures. PLoS Comput. Biol. 14, e1006375 ( 10.1371/journal.pcbi.1006375) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Fries P, Bastos AM, Mangun GR, Adams RA, Friston KJ, Usrey WM. 2012. Canonical microcircuits for predictive coding. Neuron 76, 695–711. ( 10.1016/j.neuron.2012.10.038) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Friston KJ, Daunizeau J, Kilner J, Kiebel SJ. 2010. Action and behavior: a free-energy formulation. Biol. Cybern. 102, 227–260. ( 10.1007/s00422-010-0364-z) [DOI] [PubMed] [Google Scholar]
- 37.Penny WD, Kiebel SJ, Kilner JM, Rugg MD. 2002. Event-related brain dynamics. Trends Neurosci. 25, 387–389. ( 10.1016/S0166-2236(02)02202-6) [DOI] [PubMed] [Google Scholar]
- 38.Moran RJ, Stephan KE, Seidenbecher T, Pape HC, Dolan RJ, Friston KJ. 2009. Dynamic causal models of steady-state responses. Neuroimage 44, 796–811. ( 10.1016/j.neuroimage.2008.09.048) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Friston KJ, Bastos A, Litvak V, Stephan KE, Fries P, Moran RJ. 2012. DCM for complex-valued data: cross-spectra, coherence and phase-delays. Neuroimage 59, 439–455. ( 10.1016/j.neuroimage.2011.07.048) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Valdes PA, Jimenez JC, Riera J, Biscay R, Ozaki T. 1999. Nonlinear EEG analysis based on a neural mass model. Biol. Cybern. 81, 415–424. ( 10.1007/s004220050572) [DOI] [PubMed] [Google Scholar]
- 41.Friston KJ, Mechelli A, Turner R, Price CJ. 2000. Nonlinear responses in fMRI: the balloon model, Volterra kernels, and other hemodynamics. Neuroimage 12, 466–477. ( 10.1006/nimg.2000.0630) [DOI] [PubMed] [Google Scholar]
- 42.Hawkins BT, Davis TP. 2005. The blood-brain barrier / neurovascular unit in health and disease. Pharmacol. Rev. 57, 173–185. ( 10.1124/pr.57.2.4) [DOI] [PubMed] [Google Scholar]
- 43.Daunizeau J, Stephan KE, Friston KJ. 2012. Stochastic dynamic causal modelling of fMRI data: should we care about neural noise? Neuroimage 62, 464–481. ( 10.1016/j.neuroimage.2012.04.061) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Penny W, Ghahramani Z, Friston K. 2005. Bilinear dynamical systems. Phil. Trans. R. Soc. B. 360, 983–993. ( 10.1098/rstb.2005.1642) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Marreiros AC, Kiebel SJ, Friston KJ. 2008. Dynamic causal modelling for fMRI: a two-state model. Neuroimage 39, 269–278. ( 10.1016/j.neuroimage.2007.08.019) [DOI] [PubMed] [Google Scholar]
- 46.Lemaréchal J-D, George N, David O. 2018. Comparison of two integration methods for dynamic causal modeling of electrophysiological data. Neuroimage 173, 623–631. ( 10.1016/j.neuroimage.2018.02.031) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Pinotsis DA, Moran RJ, Friston KJ. 2012. Dynamic causal modeling with neural fields. Neuroimage 59, 1261–1274. ( 10.1016/j.neuroimage.2011.08.020) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Friston KJ. 2008. Variational filtering. Neuroimage 41, 747–766. ( 10.1016/j.neuroimage.2008.03.017) [DOI] [PubMed] [Google Scholar]
- 49.Friston K, Mattout J, Trujillo-Barreto N, Ashburner J, Penny W, Friston K. 2006. Variational free energy and the Laplace approximation. Neuroimage 34, 220–234. ( 10.1016/j.neuroimage.2006.08.035) [DOI] [PubMed] [Google Scholar]
- 50.Penny WD. 2011. Comparing dynamic causal models using AIC, BIC and free energy. Neuroimage 59, 319–330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Friston KJ, Li B, Daunizeau J, Stephan KE. 2011. Network discovery with DCM. Neuroimage 56, 1202–1221. ( 10.1016/j.neuroimage.2010.12.039) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Oostenveld R, et al. 2011. EEG and MEG data analysis in SPM8. Comput. Intell. Neurosci. 2011, 1–32. ( 10.1155/2011/156869) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Friston KJ, Litvak V, Oswal A, Razi A, Stephan KE, Van Wijk BCM, Ziegler G, Zeidman P. 2016. Bayesian model reduction and empirical Bayes for group (DCM) studies. Neuroimage 128, 413–431. ( 10.1016/j.neuroimage.2015.11.015) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Park H-J, Pae C, Friston K, Jang C, Razi A, Zeidman P, Chang WS, Chang JW. 2017. Hierarchical dynamic causal modeling of resting-state fMRI reveals longitudinal changes in effective connectivity in the motor system after thalamotomy for essential tremor. Front. Neurol. 8, 346 ( 10.3389/fneur.2017.00346) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Papadopoulou M, Cooray G, Rosch R, Moran R, Marinazzo D, Friston K. 2017. Dynamic causal modelling of seizure activity in a rat model. Neuroimage 146, 518–532. ( 10.1016/j.neuroimage.2016.08.062) [DOI] [PubMed] [Google Scholar]
- 56.Adrian ER, Robert EK. 1995. Bayes Factors. J. Amer. Stat. Assoc. 90, 773–795. ( 10.1080/01621459.1995.10476572) [DOI] [Google Scholar]
- 57.Rosa MJ, Bestmann S, Harrison L, Penny W. 2010. Bayesian model selection maps for group studies. Neuroimage 49, 217–224. ( 10.1016/j.neuroimage.2009.08.051) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Friston K, Parr T, Zeidman P. 2018. Bayesian model reduction. arxiv.org. 18 May 2018.
- 59.Zeidman P, Jafarian A, Seghier ML, Litvak V, Cagnan H, Price CJ, Friston KJ. 2019. A guide to group effective connectivity analysis, part 2: second level analysis with PEB. Neuroimage 200, 12–25. ( 10.1016/j.neuroimage.2019.06.032) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Lancaster G, Iatsenko D, Pidde A, Ticcinelli V, Stefanovska A. 2018. Surrogate data for hypothesis testing of physical systems. Phys. Rep. 748, 1–60. ( 10.1016/j.physrep.2018.06.001) [DOI] [Google Scholar]
- 61.McConnell HL, Kersch CN, Woltjer RL, Neuwelt EA. 2017. The translational significance of the neurovascular unit. J. Biol. Chem. 292, 762–770. ( 10.1074/jbc.R116.760215) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Jafarian A, Litvak V, Cagnan H, Friston KJ, Zeidman P. 2019. Neurovascular coupling: insights from multi-modal dynamic causal modelling of fMRI and MEG. arxiv.org. 2019 March 18. [DOI] [PMC free article] [PubMed]
- 63.Stephan KE, Kasper L, Harrison LM, Daunizeau J, den Ouden HEM, Breakspear M, Friston KJ. 2008. Nonlinear dynamic causal models for fMRI. Neuroimage 42, 649–662. ( 10.1016/j.neuroimage.2008.04.262) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Friston K, Penny W, Li B, Hu D, Daunizeau J, Stephan KE. 2011. Generalised filtering and stochastic DCM for fMRI. Neuroimage 58, 442–457. ( 10.1016/j.neuroimage.2009.09.031) [DOI] [PubMed] [Google Scholar]
- 65.Sengupta B, Friston K. 2017. Approximate Bayesian inference as a gauge theory.
- 66.Moran R. 2015. Deep brain stimulation for neurodegenerative disease, 1st edn. vol. 222, pp. 125–146. Computational Neurostimulation Amsterdam, The Netherlands: Elsevier B.V. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Code demonstrating the methods described in this paper can be found in the freely available SPM software package—https://www.fil.ion.ucl.ac.uk/spm/. After installation, type DEM (short for Demo) and press enter, to view all available demos.