Abstract
This paper provides a worked example of using Dynamic Causal Modelling (DCM) and Parametric Empirical Bayes (PEB) to characterise inter-subject variability in neural circuitry (effective connectivity). It steps through an analysis in detail and provides a tutorial style explanation of the underlying theory and assumptions (i.e, priors). The analysis procedure involves specifying a hierarchical model with two or more levels. At the first level, state space models (DCMs) are used to infer the effective connectivity that best explains a subject's neuroimaging timeseries (e.g. fMRI, MEG, EEG). Subject-specific connectivity parameters are then taken to the group level, where they are modelled using a General Linear Model (GLM) that partitions between-subject variability into designed effects and additive random effects. The ensuing (Bayesian) hierarchical model conveys both the estimated connection strengths and their uncertainty (i.e., posterior covariance) from the subject to the group level; enabling hypotheses to be tested about the commonalities and differences across subjects. This approach can also finesse parameter estimation at the subject level, by using the group-level parameters as empirical priors. The preliminary first level (subject specific) DCM for fMRI analysis is covered in a companion paper. Here, we detail group-level analysis procedures that are suitable for use with data from any neuroimaging modality. This paper is accompanied by an example dataset, together with step-by-step instructions demonstrating how to reproduce the analyses.
Highlights
-
•
This guide walks through a group effective connectivity study using DCM and PEB.
-
•
It explains recently developed tools for hierarchical Bayesian modelling.
-
•
The appendices clarify the technical detail of the PEB framework and its priors.
-
•
An accompanying dataset is provided with step-by-step analysis instructions.
1. Introduction
Neuroimaging studies typically have a hierarchical form. At the first (within subject) level, the neural responses of individual subjects are inferred from measurements (e.g. fMRI, EEG, MEG) by specifying and fitting suitable models. The ensuing model parameters are then taken to the second (between subject) level, where the commonalities and differences across subjects are assessed. There may be further levels of the hierarchy; for example, each group of subjects may have been sampled from different populations. This naturally suggests the use of a hierarchical model that links individual subjects to the population(s) from which they were sampled. In this paper, we address the practicalities of hierarchical modelling in brain connectivity studies.
Dynamic Causal Modelling (DCM) is the predominant analysis framework for inferring effective connectivity; namely, the directed causal influences among neural populations that mediate an effect of one population on another (Friston et al., 2003). In this context, a model is a set of differential equations that describe the transformations from experimental stimulation through neural activity to observed data. The model has parameters, such as the strength of particular neural connections, which are estimated from the data using a Bayesian (probabilistic) method called Variational Laplace (Friston et al., 2007). This provides a probability density over the possible values of the parameters (e.g., connection strengths), as well as a score for the quality of the model, called the log-evidence or (negative) variational free energy.
Having inferred every subject's connectivity strengths, the next challenge is how to quantify the commonalities and differences across subjects. For example, one may wish to test for differences between a patient group and a control group, or investigate whether the dose of a drug alters certain connections, or whether there is a relationship between connection strengths and behavioural measures. To enable these kinds of hypotheses to be tested efficiently, DCM was recently supplemented with a hierarchical model over parameters, called the Parametric Empirical Bayes (PEB) framework (Friston et al., 2016). Readers familiar with mass-univariate analysis in neuroimaging (Statistical Parametric Mapping, SPM) will be accustomed to the ‘summary statistic’ approach, which begins with quantifying effects at the first or within-subject level, followed by a second level analysis to test whether these effects are conserved over subjects. The PEB framework enables a similar workflow for DCM, illustrated schematically in Fig. 1 in the context of the example fMRI experiment presented here. For each subject, a DCM is specified and a probability density over their parameters (e.g. connection strengths) is estimated from the data (Fig. 1, bottom). The parameters of interest are then collated and modelled at the second level using a General Linear Model (GLM), with any unexplained between-subject variability captured by a covariance component model. Therefore, individual differences in connection strengths are decomposed into hypothesised group-level effects, plus any unexplained random effects (RFX), illustrated in Fig. 1 (top). Having estimated the group-level parameters (e.g., group-average connection strengths), hypotheses can be tested by comparing the evidence for different mixtures of these parameters – a process called Bayesian model comparison. With the PEB approach, Bayesian model comparison is only introduced at the point at which a hypothesis is tested. In group studies, the hypothesis is, by definition, about between subject effects. Our focus is therefore on comparing different models of between-subject effects on (condition specific changes in) within-subject connectivity.
Fig. 1.
The two-level Parameter Empirical Bayes (PEB) model used in our analysis. At the first level (bottom of the figure), a forward model (DCM) describes how neural activity causes the fMRI timeseries of each subject . The parameters from the neural part of the model are taken to the group level (upper part of the figure), where they are modelled as a linear combination of a group mean, a group difference due to laterality index, covariates of no interest (not shown) and random effects (RFX) variability. Image credits: “Brain image” by parkjisun and “CT Scan” by Vectors Market from the Noun Project.
A simple alternative to using parametric empirical Bayes is to take the expected values of the estimated connectivity parameters from all subjects and run classical statistics such as ANCOVA or MANCOVA. The use of a Bayesian framework is distinct in several respects. As illustrated in Fig. 1, the estimated parameters in DCM are (multivariate normal) probability densities. Taking just the expected values – i.e. the peaks - from each subject to the group level would ignore the estimated covariance (uncertainty) of the parameters. By contrast, the PEB framework takes both the expected values and the covariance of the parameters to the group level (through conditional dependencies among random variables, detailed in Equation (4)). A useful implication is that more precise parameter estimates will have greater influence on the group-level result, so subjects with noisy data and uncertain parameter estimates will be down-weighted. Additionally, each level of the hierarchy serves as a prior on the parameters of the level below. This can, for example, enable better estimates at the single subject level, by incorporating knowledge of the experimental effects or connection strengths that are typical for the group. This could be important in settings where precise inferences about individual subjects are needed (e.g., when identifying potential biomarkers).
The PEB framework may be contrasted with an established approach for group-level modelling of effective connectivity; namely, Random Effects Bayesian Model Selection (RFX BMS), introduced in Stephan et al. (2009). With RFX BMS, multiple DCMs are specified per subject, where each DCM corresponds to one hypothesis about the network architecture. These DCMs are fitted to each subject's data. A hierarchical model is then defined, which estimates the relative probability that any randomly selected person from the population would have had their data generated by each DCM. The ‘random effect’ in this context is which model generated each subject's data; i.e., some subjects may have had their data generated by model 1, others by model 2, etc. This sort of random effect on models is typically used when subjects vary in their neural architecture in terms of the presence or absence of particular connections, or to compare different model architectures (e.g. different DCM forward models). The RFX BMS method has also benefited from extensive validation (Rigoux et al., 2014).
The PEB approach differs fundamentally, in that it is a hierarchical model with random effects on parameters rather than models. In other words, it assumes that all subjects have the same basic architecture, i.e. they can be explained by the same DCM forward model, but they differ in terms of the strength of connections within that model. For example, the PEB approach could be used to test for a decrease in effective connectivity due to the administration of a drug (i.e. a continuous difference in connectivity), or to test for a complete functional disconnection due to a lesion, in which case the relevant connectivity parameters would be expected to shrink towards zero. Practically, only one DCM is specified per subject and the estimated parameters are taken to the group level, where hypotheses about between subject effects are tested. With this approach, parameters may be treated as ‘fixed effects’ or ‘random effects’ (i.e., sampled from a wider population) by setting appropriate priors on between-subject variability in the hierarchical model. Unlike the RFX BMS approach, hypothesised between-subject effects can be continuous behavioural or clinical measures (e.g., a brain lateralisation index – LI – in this study), rather than the presence or absence of connections. Also, by comparing the free energy of group-level PEB models, hypotheses can be tested in terms of different mixtures of first level effects (connections) and second level effects (covariates), affording a flexibility in the questions that can be addressed. Crucially, by only inverting one ‘full’ DCM per subject in the PEB approach – rather than multiple DCMs per subject – one does not run into the problem of different DCMs falling into different local optima – a particular problem for neural mass models used with electrophysiological data.
To illustrate PEB in this setting, this paper walks through the analysis of a previously published fMRI study on the lateralisation of semantic processing, using DCM and PEB. The first level DCM analysis is described in the first part of the tutorial (in the companion paper), which focused on the specifics of fMRI data analysis. Here, we address the more generic issue of how to model commonalities and differences among subjects in effective connectivity at the group level, regardless of the imaging modality. After introducing the example dataset, we describe the specification of a PEB model and demonstrate testing hypotheses using Bayesian Model Reduction (BMR) – a particularly efficient form of Bayesian model selection. We conclude by illustrating how predictive validity can be assessed using cross-validation. To help associate methods with their implementation in the SPM software (http://www.fil.ion.ucl.ac.uk/spm/software/), MATLAB function names are provided in bold text, such as (spm_dcm_fit.m), and are listed in Table 1. The example dataset and a step-by-step guide to running these analyses are available from https://github.com/pzeidman/dcm-peb-example.
Table 1.
SPM software functions in the PEB framework.
| Function name | Analysis level | Description |
|---|---|---|
| spm_dcm_fit | 1st | Fits first level DCMs, arranged in a GCM array*, to the data using variational Laplace. |
| spm_dcm_peb_fit | 1st | Iteratively re-estimates each subject's first level DCM using the group average parameter estimates as empirical priors. |
| spm_dcm_bmc | 2nd | Performs fixed effects and random effects Bayesian model selection (RFX BMS) on DCMs (this is not required for PEB analysis). |
| spm_dcm_peb | ≥2nd | Specifies and estimates a PEB model. |
| spm_dcm_peb_bmc | ≥2nd | Compares an estimated PEB model against reduced models where particular combinations of parameters (relating to particular connections) are switched off. In the absence of pre-specified alternative models, an automatic search over reduced models is performed. (To compare different mixtures of covariates and pre-specified first level models, instead use spm_dcm_bmc_peb.) |
| spm_dcm_peb_bmc_fam | ≥2nd | Performs family-wise model comparison on the output of spm_dcm_peb_bmc, to enable groups of pre-defined models to be compared. |
| spm_dcm_peb_review | ≥2nd | Graphical user interface for reviewing a PEB model or model comparison result. |
| spm_dcm_loo | 1st and 2nd | Performs leave-one-out cross validation to assess the predictive validity of DCM parameters. |
* A GCM (Group DCM) array is a Matlab cell array of DCM structures or filenames, with one row per subject and one column per model. For most group analyses, the first column of the GCM is expected to contain each subject's ‘full’ model, which includes all parameters of interest, and subsequent columns contain reduced models with certain parameters fixed at their prior expectation (typically zero).
2. Experimental design
We used data from a previously published fMRI experiment on language lateralisation with 60 subjects (Seghier et al., 2011). To recap, this experiment investigated how the left and right frontal lobes interact during semantic (relative to perceptual) processing of words. While language is typically thought to be left lateralised, the right hemisphere also responds in language tasks, and this experiment focussed on individual differences in the degree of lateralisation. At the within-subject level, it was a balanced 2x2 factorial design with factors: stimulus type (Words or Pictures) and task (Semantic or Perceptual matching). There was also a single factor at the between-subject level: Laterality Index (LI). This is a measure of functional brain asymmetry, derived from the number of activated voxels in each hemisphere in an initial SPM analysis. A positive LI (towards +1) indicated left hemisphere dominance, whereas a negative LI (towards −1) indicated right hemisphere dominance. The main question addressed by this experiment was: what neural circuitry underlies individual differences in LI?
Analysing a group connectivity study begins by identifying which effects would be expected to occur at the within-subject level, and which would be expected at the between-subject level. Typically, within-subject effects consist of one measurement per trial, whereas between-subject effects consist of one measurement per subject. Here, the effects of words and pictures were at the within-subject level, so we included these in each subject's first level General Linear Model (GLM) and subsequent DCM, whereas LI was a between-subject measure, so we only introduced it in the second level PEB analysis. Other factors of no interest (e.g., age, gender, and handedness) were also included at the between-subject level.
Each subject's DCM consisted of a network with four brain regions: left ventral frontal (lvF), left dorsal frontal (ldF), right ventral frontal (rvF) and right dorsal frontal (rdF) cortex. Each of these regions could be modulated by two experimental conditions: pictures and words. Therefore, as detailed in the next section, fitting DCMs to the fMRI data provided us with estimates of eight interesting parameters per subject: 1) modulation of region lvF by pictures, 2) ldF by pictures, 3) rvF by pictures, 4) rdF by pictures, 5) lvF by words, 6) ldF by words, 7) rvF by words and 8) rdF by words. Here, we asked which mixtures of these eight parameters best explained the commonalities across subjects, and which best explained individual differences due to brain laterality (LI). We hypothesised that the commonalities across subjects, and the LI differences, could be expressed in:
-
1.
the network's response to pictures and/or words,
-
2.
the network's response in dorsal and/or ventral regions,
-
3.
the network's response in left and/or right hemisphere regions.
These hypotheses formed independent factors in our analysis. We will detail the specification and comparison of group level models that varied across these factors (Fig. 2), as well as illustrating the comparison of group level models in a less constrained manner, using an automatic search of the hypothesis or model space. First, we will briefly reprise the first level DCM analysis, which quantified the effects of pictures and words on each connection for each subject.
Fig. 2.
Model space. Each picture is a network of four frontal brain regions. The grey circles in the top row of each picture are left dorsal (ldF) and right dorsal (rdF) frontal cortex, and the circles in the bottom row of each picture are left ventral (lvF) and right ventral (rvF) frontal cortex. The red curved lines indicate inhibitory self-connections that were allowed to be modulated by the task. Candidate PEB models varied according to the three factors shown. Additionally, a ‘null’ model (not shown) was specified with no modulation to serve as a baseline. This gave rise to models.
3. First level analysis: DCM
Our connectivity analysis began by specifying one DCM for fMRI model for each subject (Fig. 1, bottom). Here, we focused on the subset of parameters from these models that quantified the change in each brain region's inhibitory self-connection due to words and pictures - these are the leading diagonal of matrix in the DCM for fMRI neural model (see Discussion for consideration of self-connections versus between-region connections). These parameters had values greater than zero if there was an increase in self-inhibition due to the stimuli, and so a reduction in sensitivity to inputs from the rest of the network. They had values less than zero if there was a reduction in self-inhibition (i.e., disinhibition) due to the stimuli, thus increasing sensitivity to inputs from the network.
We shall write these parameters of interest as for subject , where the superscript number 1 indicates the first level of analysis. Notation details are provided in the footnote.1
We then inverted (estimated) each subject's DCM. This involves finding the posterior density over parameters that maximises a score for the model, called the negative variational free energy . This approximates (a lower bound on) the log model evidence , which is the log of the probability of having observed the data given the model . In short, Bayesian model inversion provides an estimate of log evidence and a multivariate normal probability density over the parameters:
| (1) |
where is a vector of the expected values of the parameters (e.g., the connection strengths) and is their posterior covariance matrix. Elements on the leading diagonal of are the posterior variance (uncertainty) of each parameter and the off-diagonal elements are their corresponding covariance.
The estimation scheme, called variational Laplace (Friston et al., 2007), tunes the parameters (e.g. connection strengths) with the objective of making the predicted neuroimaging timeseries match the observed timeseries as closely as possible (maximising accuracy), while penalizing movement of the parameters away from their prior values (minimizing complexity). The ideal compromise between accuracy and complexity maximises the negative variational free energy . However, this process inherently involves uncertainty: multiple settings of the parameters could provide similarly good explanations for the data. Furthermore, there will typically be some collinearity between the parameters, meaning their effect on the predicted timeseries cannot be disambiguated. DCM quantifies this uncertainty by returning a full probability density over the parameters – both their expected values and covariance - as defined in Equation (1).
4. Second level analysis: Parametric Empirical Bayes (PEB)
Having inverted each subject's DCM, we ran a PEB analysis (spm_dcm_peb.m). The procedure begins by collating the estimated parameters of interest from all subjects . This consists of two quantities from each subject: the expected values of the parameters and their covariance matrices . Typically, only parameters relevant to the hypotheses being tested are taken to the group level, to improve statistical efficiency. For example, a priori, we expected laterality to effect differences in neuronal coupling, but we did not expect it to effect haemodynamics. This means including haemodynamic parameters in the PEB analysis could confound evidence for neural effects, especially in the presence of collinearity or conditional dependence. Effectively, parameters not included in the group level analysis (e.g. related to haemodynamics) are treated as fixed effects – meaning they are not treated as if they are sampled from a wider population – and are therefore released from the constraint of being concentrated around some group mean. Before detailing how we configured the PEB model for this experiment, we first offer a brief overview of the underlying theory.
4.1. Theory
The PEB framework specifies a hierarchical statistical model of connectivity parameters:
| (2) |
Starting with the last line of Equation (2), the observed neuroimaging data for subject are modelled as having been generated by a DCM (or by any nonlinear model) – denoted by – with parameters . Any known uninteresting effects, such as the mean of the signal, are modelled by a GLM with design matrix and parameters . The observation noise is modelled as residuals . The second line of Equation (2) is the second level of the PEB model. It says that the vector of all subjects’ neural parameters , ordered by subject and then by parameter, can be described by a GLM with design matrix and group-level parameters .
The group-level design matrix encodes the hypothesised sources of variation across subjects, with one column for each experimental effect (called regressors or covariates). For every column in the design matrix there is a corresponding entry in parameter vector which is estimated from the data. Each of these parameters is the group-level effect of one covariate (e.g. LI, handedness or gender) on one connection. Any differences between subjects not captured by this model are defined as zero-mean (I.I.D.) additive noise . These are the random effects (RFX). In other words, from the point of view of a generative model, to generate a single subject's data, one would first sample a parameter vector from the prior distribution (level 3), and add a random effect for this subject (level 2). Finally, one would generate data using the DCM and add some observation noise (level 1). More levels could be added to the hierarchy; for example, to model nested groups of subjects. However, Equation (2) stops at two levels and specifies that parameters would have fixed prior expected value and residuals , expressed in the first line of Equation (2). In summary, this hierarchical model partitions the variability in connectivity parameters across subjects into hypothesized group-level effects and uninteresting between-subject variability .
To turn this into a statistical (i.e. probabilistic) model, we first define the probability density over the error terms:
| (3) |
Where the covariance matrices , and are I.I.D. and define the zero-mean additive noise at each level of the hierarchy. The model can then be written in terms of probability densities:
| (4) |
Equation (4) specifies the joint probability of all three quantities of interest; namely, data from all subjects , the DCM parameters of all subjects and the group-level parameters . The last equality states that the timeseries for subject is generated by their DCM, with covariance (uncertainty) determined by their observation noise. The penultimate equality defines the transition from single subjects to the group level. The second equality defines the priors on the group-level parameters. Each level acts as a constraint on the level below – meaning that group level parameters constrain the estimates of the individual subjects.
Configuring the PEB model for an experiment only requires specifying the second level design matrix , detailed in the next section. The software provides default prior probability densities for the between-subject variability and the second level neural parameters In brief, the prior variances used for inverting subject-specific DCMs are also used as the prior variance around the group mean, while random effects at the between subject level are set to 1/16 of this variance. This assumes that people only acquire data from multiple subjects when the random effects from subject to subject are less than (i.e., have a quarter of the standard deviation of) the range of values that one expects a priori. For details, please see Appendix 1: PEB connectivity priors specification and Appendix 2: PEB random effects specification.
4.2. PEB: design matrix specification
The group-level design matrix defines the hypotheses about between-subject variability. It is specified in two parts: between-subject effects and within-subject effects . The between-subject design matrix (Fig. 3, left) encodes covariates or regressors in each column, with one row per subject. The software implementation in SPM expects the first column to be ones (to model the commonalities across subjects; i.e., constant or group mean) and the second column is usually the effect of interest. Other covariates are placed in the subsequent columns. For this experiment, we included covariates encoding the group mean, LI score, handedness, gender and age. (We computed the LI score by taking the first principal component of the four different measures used in Seghier et al. (2011).)
Fig. 3.
The group-level design matrix. Left: the between-subjects design matrix, with regressors (covariates) modelling the group mean, laterality index, handedness, gender and age. Middle: the within-subjects design matrix, where the diagonal encodes which DCM parameters receive group-level effects. This was set to the identity matrix to include all between-subject effects on all DCM parameters. Right: The resulting design matrix, computed according to Equation (5). Each row corresponds to one DCM parameter from one subject and each column corresponds to one group-level effect (e.g. gender) on one DCM parameter. For display purposes only, the regressors were z-scored and thresholded to produce colours in a consistent range.
An important decision – when preparing the between-subjects design matrix – is whether to mean-centre regressors which follow the first constant term. If they are mean-centred, then the first regressor (the column of ones) would correspond to the mean experiment-related changes in connectivity over subjects, and the between-subject effects would add to or subtract from this. If they were not mean-centred, then the first regressor would correspond to the baseline or intercept of the model. Here, we did mean-centre, endowing the first regressor with the interpretation of the group mean effective connectivity.
The within-subjects design matrix (Fig. 3, middle) defines which DCM connectivity parameters can receive between-subject effects (via setting each element on the leading diagonal to 1 or 0). Here, these were ordered into the effects of picture stimuli on each of the four regions, followed by the effects of words on each of the four regions. We set this to be the identity matrix (ones on the leading diagonal, zeros elsewhere), enabling the between-subject effects to influence all connections. The two parts of the design matrix are combined by the software (spm_dcm_peb.m) according to:
| (5) |
Where is the Kronecker tensor product, which replicates matrix for each element of . This is a subtle aspect of PEB that makes things slightly more complicated than simply modelling experimental effects on some response variable. Here, we have to specify what the experimental (between-subject) effects actually act upon; namely, which (within-subject) parameters. For example, the effect of laterality may be expressed in all connectivity parameters – or just one parameter. By default, the software assumes that every experimental (between-subject) effect can be expressed on every (within-subject) parameter.
A part of the resulting design matrix is shown in Fig. 3 (right). The first eight columns encode the group mean of each connectivity parameter. Columns 9–16 encode the effect of LI on each connectivity parameter, and the remaining columns encode the effects of handedness, gender and age. The rows are ordered subject-wise, corresponding to DCM parameters of interest from subject one, then from subject two, etc.
4.3. PEB: model estimation
Having specified the design matrix we inverted the PEB model (spm_dcm_peb.m), which furnished two useful quantities: the estimated group-level parameters and the group-level free energy. The parameter estimates correspond to a multivariate Gaussian density:
| (6) |
Where vector are the estimated betas or weights on the covariates in the design matrix and the covariance matrix specifies uncertainty over these weights. (This should not be confused with in Equation (3), which specifies the between-subject variability.)
Fig. 4 (left) shows the estimated second level parameters pertaining to the effects of interest; namely, the group average and the effect of LI (spm_dcm_peb_review.m). The heights of the bars are in Equation (6) and the error bars were computed from the leading diagonal of covariance matrix . Parameters 1–8 are the commonalities (group average connectivity parameters) across all subjects and parameters 9–16 are the differences in connectivity due to LI score. Each parameter quantifies the change in an inhibitory self-connection due to words or pictures.
Fig. 4.
Parameters of the group-level Bayesian General Linear Model (GLM). Left: the posterior parameter estimates, corresponding to the first 16 regressors in design matrix . These represent the commonalities (average) across subjects (GLM parameters 1–8) and the difference in connectivity due to Laterality Index (LI, GLM parameters 9–16). The units are the same as the parameters in the underlying DCMs; i.e., unitless log scaling parameters that scale the default self-connection strength of −0.5Hz. Dotted lines indicate parameters relating to the modulation of region rdF by words, discussed in the text. Right: the estimated between-subject variance of each connectivity parameter, after accounting for known effects. Parameters have the same order in both plots: 1 = Modulation of lvF by pictures, 2 = ldF by pictures 3 = rvF by pictures, 4 = rdF by pictures, 5 = lvF by words, 6 = ldF by words, 7 = rvF by words, 8 = rdF by words.
To focus on a specific example – which will be relevant to the analyses which follow – consider the DCM parameter quantifying the effect of words on the inhibitory self-connection of region rdF. This was represented by two PEB GLM parameters, indicated in Fig. 4 (left) with dashed lines. Parameter number eight is the average effect of words on rdF across subjects and parameter number 16 is the difference in this effect between subjects due to LI. From this plot, we can see that words increased region rdF's inhibitory self-connection. The effect size was 0.41 in the units of the underlying DCM. Parameter number 16 shows that the effect of words on rdF also correlated positively with LI score. The effect size was 1.79, meaning that the contribution of LI was to increase the rdF self-connection by 1.79 times the LI score. Note that these are just unstandardized effects sizes; to formally test hypotheses about where laterality effects are manifest, model comparisons are required, which we will return to shortly. Fig. 4 (right) shows the estimated between-subject variability (i.e., random effects) of the eight DCM connectivity parameters, corresponding to the leading diagonal of matrix in Equation (3).
Inversion of the full PEB model also returned a score for its quality – the (negative) variational free energy – which approximates the log model evidence:
| (7) |
This is the log of the probability of observing the neuroimaging data (from all subjects) given the entire hierarchical model . Note there is a distinction between the free energy at the first level (relating to individual subjects' DCMs) and the second level (relating to the entire group and our model of between subject effects). At the first level, the free energy of a DCM is its accuracy minus its complexity; where complexity is the difference (KL-divergence) between the estimated parameters and the priors. At the second level, the free energy is the sum of all subjects’ DCMs accuracies, minus the complexity induced by fitting the DCMs and the second-level GLM (see Equation (10) of Friston et al. (2016)). By comparing the free energy of PEB models – with different sets of parameters switched on and off2 – and selecting the PEB models with the greatest free energy, one can find the optimal explanation for the dataset as a whole.
4.4. Inference: comparing reduced PEB models
Having estimated the parameters of the group-level GLM, we next tested hypotheses, specifically to find out whether there was an effect of laterality and, if so, where it was expressed. In the PEB framework, this is done by comparing the evidence for reduced GLMs that have certain combinations of parameters switched off (fixed at their prior expectation of zero). Comparing full and reduced models in this way is conceptually similar to performing an F-test in classical statistics. In practice, the evidence and parameters of the reduced models can be derived analytically from the full model in a matter of milliseconds, using a procedure referred to as Bayesian Model Reduction (BMR, see Appendix 3: Bayesian Model Comparison and Reduction).
We defined a set of candidate models to identify the best explanation for the commonalities across subjects and the best explanation for the LI differences, in terms of the three experimental factors defined above (Fig. 2). Each factor had three levels, plus a null hypothesis, necessitating candidate models to evaluate all combinations of the factors, shown schematically in Fig. 5A. This is referred to as a factorial model space, because each of the three factors can be imagined as an axis on a graph, with each of the models situated somewhere in the space formed by those axes.
Fig. 5.
Comparison of PEB models in a pre-defined model space. A. Connectivity parameters switched on (white) and off (black) in each model. See the legend of Fig. 4 for the identity of each parameter. B. Joint probability of all candidate models. The axes list the 28 candidate models (combinations of connections) in terms of commonalities across subjects and differences between subjects due to Laterality Index (LI). In other words, the PEB model in row column had parameters relating to commonalities across subjects set according to model , and parameters relating to LI set according to model . The best model was number 4 for the commonalities and 15 for laterality differences, with 56% posterior probability. C. The same result shown in part B, summed over the columns and re-normalized, to give the posterior probability for the commonalities across subjects. D. The same result shown in panel B, summed over the rows and re-normalized, to give the posterior probability for the models of laterality index (LI). E-F. Schematic diagrams of models 4 and 15. Curved lines are self-connections modulated by words and/or pictures. G. Bayesian Model Average (BMA) of the parameters over all models. H. The BMA thresholded at posterior probability >95% for clarity. The schematic illustrates the parameters which survived thresholding. P=Pictures, W=Words, LI = Laterality Index.
We asked which of the 28 models (patterns of switched on/switched off parameters) was the best explanation for the commonalities across subjects and which was the best explanation for the LI differences. This meant comparing the evidence for a total of reduced GLMs (spm_dcm_peb_bmc.m). Fig. 5B shows the resulting matrix of posterior probabilities, where element is the probability of the GLM, which had parameters relating to commonalities across subjects configured according to model , and parameters relating to laterality (LI) configured according to model . It is clear from Fig. 5B that one GLM stood out (56% posterior probability). Its commonalities parameters were deployed according to model 4 and its LI parameters were deployed according to model 15. This result is shown more clearly by marginalising (summing) over the columns and rows and re-normalising, to give the probability for each model of the commonalities (Fig. 5C) and LI differences (Fig. 5D). Model 4, which was the best explanation for the commonalities across subjects, had modulation by words and pictures in dorsal regions (Fig. 5E). Model 15, which was the best explanation for the differences across subjects due to LI, had words specifically modulating the self-connection of rdF (Fig. 5F).
Despite the relatively strong probability for these two models, no single model could be described as an overall winner (>95% probability), which is unsurprising given the large number of models. To summarise the parameters across all models, we computed the Bayesian Model Average (BMA). This is the average of the parameters from different models, weighted by the models’ posterior probabilities (Hoeting et al., 1999; Penny et al., 2006). The averaged parameters are shown in Fig. 5G and the values are listed in Table 3. To focus our results on just those parameters that had the greatest evidence of being non-zero, we thresholded the BMA to only include parameters that had a 95% posterior probability of being present vs absent (thresholding based on the free energy is detailed in Appendix 3). As illustrated in Fig. 5H, in common across subjects, pictures increased self-inhibition in region ldF and dis-inhibited rdF; shifting the balance of activation towards the right hemisphere with no differences across subjects due to LI score. By contrast, words increased self-inhibition in rdF, which was further increased in subjects with more positive LI scores; i.e. with greater left-lateralisation of brain responses. We may conclude, therefore, that individual differences in LI score could be explained by differences in the gain or excitability of region rdF, when presented with words.
Table 2.
Symbols used in this paper.
| Variable | Dimension | Meaning |
|---|---|---|
| Ratio of prior uncertainty about group-level effects to prior uncertainty about first-level effects. | ||
| Ratio of within-subject prior variance (uncertainty) to between-subject prior RFX variance (variability). | ||
| Parameters for null effects | ||
| Total number of group level covariates | ||
| Total number of first level covariates of no interest | ||
| Observation noise for subject | ||
| Between-subject variability or random effects (RFX) | ||
| Residuals (uncertainty) about group-level parameters θ(2) | ||
| Expected values of group-level parameters θ(2) | ||
| Negative variational free energy for level of the model | ||
| First-level model (e.g. DCM) for subject | ||
| Number of experimental conditions at the first level | ||
| Log scaling parameter for precision component Qj | ||
| Norm of each covariate (regressor) | ||
| Generative model (likelihood and priors) | ||
| Expected values of the DCM parameters for subject | ||
| Prior expectation of the DCM parameters | ||
| Number of subjects (not to be confused with the multivariate normal distribution ). | ||
| Total number of free parameters per DCM taken to the group level | ||
| RFX precision matrix | ||
| Precision component | ||
| Number of modelled brain regions | ||
| Covariance matrix of the DCM parameters for subject | ||
| Prior covariance of the DCM parameters | ||
| All DCM parameters from all subjects | ||
| All DCM parameters for subject | ||
| Covariance of between-subject variability ε(2) | ||
| Covariance of residuals (uncertainty) ε(3) | ||
| Group-level parameters | ||
| Total measurements (volumes) per subject | ||
| Prior variance of DCM parameter | ||
| Design matrix | ||
| Between-subjects design matrix | ||
| Within-subjects design matrix | ||
| Design matrix for null effects | ||
| Observed timeseries from subject from all regions |
Table 3.
BMA of PEB parameters: Expected values of estimated commonalities and LI differences.
| Commonalities | Laterality Index | |||
|---|---|---|---|---|
| Region | Pictures | Words | Pictures | Words |
| lvF | −0.02 | 0.02 | 0 | 0 |
| ldF | 0.40 | 0.06 | −0.001 | −0.02 |
| rvF | −0.07 | 0 | 0 | 0.01 |
| rdF | −0.23 | 0.38 | −0.02 | 1.74 |
*‘Pictures’ and ‘Words’ are the log of scaling parameters that multiply up or down the default self-connection (−0.5Hz). Positive values indicate greater self-inhibition than the default and negative values indicate less self-inhibition than the default.
4.5. Inference: family analysis
In the model comparison above, there were three factors with three levels each, plus a ‘null’ model, giving models of the commonalities across subjects and 28 models of the LI differences, thus candidate models in total. Rather than conclude the analysis by plotting 784 values, it is clearer to report a single statistic for each of the three factors. To do this, we grouped the 784 models into ‘families’ according to each of the experimental factors and compared the pooled evidence of each family (Penny et al., 2010). The results (computed using spm_dcm_peb_bmc_fam.m) are shown in Fig. 6 and are described in the sections which follow.
Fig. 6.
Family-wise analyses for each of three experimental factors. Each plot is a separate analysis with the same 784 PEB models grouped into different families. For each plot, element represents the pooled probability for models in which the commonalities parameters were set according to family and the Laterality Index (LI) parameters were set according to family . For example, for Factor 1, element is the pooled probability for all models with commonalities parameters set according to family 1 and Laterality Index parameters set according to family 2. The families numbered 1–4 for each analysis were as follows. Factor 1 (task): 1 = pictures and words, 2 = words, 3 = pictures, 4 = none. Factor 2 (dorsal/ventral): 1 = dorsal and ventral, 2 = dorsal, 3 = ventral, 4 = none. Factor 3 (left/right): 1 = left and right, 2 = left, 3 = right, 4 = none. See Fig. 2 for an illustration of each factor.
4.5.1. Factor 1: whether the network was modulated by words and/or pictures
We grouped the 784 GLMs into four families according to the Task factor (see Fig. 2), plus the null model with no modulation. To recap, the families differed according to which experimental conditions modulated the self-connections: 1) pictures and words, 2) words, 3) pictures and 4) none. The evidence for the models within each family was pooled, and the families were compared under the prior that each family was equally likely. There was 86% probability that family 1 (pictures and words both modulating) was the best explanation for the commonalities and family 2 (only words) was the best explanation for LI differences. Therefore, we found that the frontal network was modulated by words and pictures, but LI differences were specifically explained by the response to words.
4.5.2. Factor 2: whether the stimuli modulated dorsal and/or ventral regions
We next grouped the same 784 models into families defined by factor 2 (dorsal vs ventral). The families were 1) dorsal and ventral, 2) dorsal, 3) ventral and 4) none. The probability that family 2 (dorsal) best explained both the commonalties and the laterality differences was 69%. The second best family, with 20% probability, was family 1 (both dorsal and ventral regions) for the commonalities. We can conclude that task effects were mainly expressed in the dorsal regions, and the effect of LI was specific to dorsal regions.
4.5.3. Factor 3: whether the stimuli modulated left and/or right hemisphere
Finally, we organised the same models according to factor 3 (left vs right). The families were 1) left and right, 2) left, 3) right and 4) none. There was 84% probability for family 1 (both left and right regions), whereas family 3 (right hemisphere) was the best explanation for differences due to LI. Therefore, differences in hemispheric asymmetry across subjects could be explained specifically by differences in the right hemisphere.
4.6. Interim summary: hypothesis-based analyses
To summarise the analyses so far, we first estimated a single DCM for each subject with all DCM parameters of interest switched on. We then formed a ‘full’ second level GLM with parameters representing the group average connection strengths (commonalities), the differences between subjects due to LI, and covariates of no interest (handedness, age and gender). We then compared 784 reduced versions of this GLM with different combinations of parameters – representing the commonalities and laterality differences across subjects – switched off. This provided free energy scores quantifying the evidence for each candidate model. By grouping the reduced GLMs into families – and pooling their evidence – we were able to quantify the effect of between-subject factors on within-subject parameters (i.e., condition specific changes in connectivity).
4.7. Inference: search over reduced PEB models
In the analyses above, large numbers of pre-defined models were compared (in seconds) using a technology called Bayesian Model Reduction (BMR). If there were no strong hypotheses about between-subject effects on connectivity, and the objective was simply to ‘prune’ any GLM parameters that did not contribute to the model evidence, BMR could be used to automatically search over reduced models. This more exploratory approach is conducted under the assumption that all reduced models are equally probable a priori, and thus the ‘full’ model only contains parameters that are biologically plausible.
For this experiment, the GLM contained 40 parameters in total (five between subject effects on eight DCM parameters – Fig. 3, right) and the objective was to find the best reduced GLM with certain parameters switched off. Evaluating all possible reduced GLMs would have required evaluating models, which is not possible in a reasonable amount of time. Therefore, to reduce the computational demand, an automatic search procedure is available (spm_dcm_peb_bmc.m), originally referred to as ‘post-hoc’ DCM analysis (Friston and Penny, 2011; Rosa et al., 2012). This procedure compares the evidence for reduced models using Bayesian Model Reduction, iteratively discarding parameters that do not contribute to model evidence. The iterative procedure stops when discarding any parameter starts to decrease model evidence. Technically, this is known as a greedy search and allows thousands of models to be compared quickly and efficiently.
A Bayesian Model Average (BMA) is then calculated over the 256 models from the final iteration of the greedy search, and the result for our data is shown in Fig. 7. The middle and right plots show the parameters with a posterior probability of being non-zero greater than 0.95. It is clear that the estimated response to words and the difference in the response to words due to LI were remarkably similar to the analysis above; however, in this illustrative analysis we did not have to declare any specific a priori models. The estimated response to pictures was slightly different, modulating region rvF rather than rdF. If this difference was of experimental interest, these two explanations could be compared in terms of their free energy and the relative probability of each explanation quantified.
Fig. 7.
BMA performed on the final 256 models of an automatic parameter search. Left: Posterior parameter estimates based on the BMA. Only parameters relating to commonalities and Laterality Index are shown. Middle: Thresholded BMA, where parameters with probability greater than 95% of being present vs. absent were retained. Each parameter's individual probability was assessed by comparing the evidence for all models (from the final iteration of a greedy search) which had the parameter switched on, versus all models which had the parameter switched off. Right: The schematic shows effects of Pictures (P), Words (W) and Laterality Index (LI).
4.8. Inference: network structure
Although our hypotheses were about changes in effective connectivity due to words and pictures (parameter matrix from the DCM neural model), it is useful to present these results in the context of the average effective connectivity across experimental conditions (parameter matrix ). This is because the parameters encode the deviation from the average connection strengths for each experimental condition. To bring these parameters to the group level, we specified and estimated a separate PEB model for the average connectivity , and performed an automatic search over reduced models. While these analyses could have been done in the same PEB model, including a large number of parameters can cause a dilution of evidence effect, as well as inducing a much larger search space. Examining each set of parameters separately can therefore help to keep the analysis focused and tractable. The parameter estimates are listed in Table 4. There were two non-trivial effects of Laterality Index: lvF- > ldF (−0.35Hz) and rdF- > ldF (0.45Hz). Thus, in subjects with greater left hemisphere dominance as scored by a more positive LI, region ldF was more inhibited by lvF and more excited by rdF, independent of whether the stimuli were pictures or words.
Table 4.
BMA of PEB parameters: Average connectivity across experimental conditions (A-matrix).
| Source | Target | Commonalities | Laterality index |
|---|---|---|---|
| lvF | lvF | −0.20 | 0 |
| lvF | ldF | 0.16 | −0.35 |
| lvF | rvF | 0.08 | 0 |
| ldF | lvF | 0.08 | 0 |
| ldF | ldF | −0.18 | 0 |
| ldF | rdF | 0.06 | 0 |
| rvF | lvF | 0.10 | 0 |
| rvF | rvF | −0.30 | 0 |
| rvF | rdF | 0.08 | 0 |
| rdF | ldF | 0.18 | 0.45 |
| rdF | rvF | 0 | 0 |
| rdF | rdF | −0.32 | 0 |
* Between-region connections are in units of Hz. Self-connections, where the source and target are the same, are the log of scaling parameters that multiply up or down the default value −0.5Hz.
5. Prediction: cross-validation
In the previous steps, we identified a laterality effect on the response of region rdF to words. Our final question was whether the size of this effect was large enough to be interesting; i.e., whether we could predict a subject's LI from their neural response. Questions of this sort – assessing predictive validity – are particularly important for studies determining the clinical significance of model parameters. To address this we used leave-one-out cross validation (spm_dcm_loo.m). A PEB model was fitted to all but one subject, and covariates for the left-out subject were predicted. This was repeated with each subject left out and the accuracy of the prediction was recorded. For the technical details of this procedure see Appendix 4: Leave-one out cross-validation with PEB.
We assessed whether we could predict subjects' LI based on their modulation of rdF by words, as well as their known covariates - handedness, age and gender. The red line in Fig. 8 (left) shows the predicted (mean-centred) LI score for each left out subject. The shaded area is the 90% credible interval of the prediction and the dashed black line is the actual mean-centred LI. In this example, 44 out of 60 of subjects had their true LI within the estimated 90% credible interval (shaded area). Fig. 8 (right) plots the out-of-samples correlation of the actual LI against the (expected value of) the predicted LI for each left-out subject. The Pearson's correlation coefficient was 0.34, p = 0.004. Therefore, we can conclude that the effect size estimated by DCM was sufficiently large to predict the left-out subjects' LI with performance above chance, although there was still a lot of variability to be explained.
Fig. 8.
Leave-one-out cross validation. Left: the out-of-samples estimate of (mean-centred) Laterality Index for each subject (red line) with 90% confidence interval (shaded area). The black dashed line is the actual group effect. Right: The actual subject effect plotted against the expected value of the estimated subject effect.
6. Analysis summary
The above analyses characterised the effective connectivity underlying individual differences in lateralisation of neural responses, in the context of a semantic processing task. We asked whether the commonalities and LI differences across subjects were expressed in condition specific changes in connectivity due to picture and/or word stimuli, in dorsal and/or ventral regions, and in left and/or right hemispheres (Fig. 2). Comparisons of PEB models showed that in common across subjects, the frontal network was modulated by pictures and words, in dorsal regions of both hemispheres (Fig. 6). The parameters revealed a double dissociation between picture and word stimuli: pictures inhibited responses in region ldF, whereas words inhibited responses in rdF (Fig. 5H). Differences between subjects – due to LI score – were specifically expressed in the response to words in the dorsal region of the right hemisphere (Figs. 5H and 6). This effect was positive, meaning that having a higher LI score (being more left lateralized) was associated with greater inhibition of region rdF. Cross-validation showed that this effect was sufficiently large to predict left-out subjects’ LI.
7. Discussion
We have illustrated an empirical (i.e. hierarchical) Bayesian procedure for conducting group connectivity analyses, using DCM and PEB. This begins with a first level analysis, modelling within-subject effects using neural models (DCMs). The connectivity parameters of interest are then taken to a second level analysis, where they are modelled using a Bayesian GLM. Hypotheses are tested by comparing the evidence for the full GLM against reduced GLMs, where certain parameters are switched off (i.e., fixed at zero). Finally, the predictive validity of the model parameters is assessed using cross-validation.
The first level analysis (detailed in the companion paper) involves designing an architecture for each subject's model, i.e. deciding which brain regions to include, selecting which connections should be informed by the data, and selecting which connections can be modulated by the experimental manipulations. The second level analysis involves deciding which parameters to treat as random effects, i.e. sampled from the population, and including these in the PEB model. Typically, only the parameters that are relevant to the hypotheses will be included in the group analysis, and for task-based DCM studies these will primarily be matrix of the DCM neural model, quantifying the modulations of connections by each experimental condition. In this example study, we only allowed modulations of the self-connections on each region, and it was these parameters that we took to the group level. These self-connection parameters provide gain control – i.e. they determine the sensitivity of each region to input coming from the rest of the network. Limiting task effects to these parameters can facilitate a more identifiable model (i.e. eliciting parameters that can be distinguished from one another with higher confidence), thereby maximising the chances of identifying task-related effects. Furthermore, the self-connections lend themselves to a straightforward biological interpretation, namely modifying the inhibitory-excitatory balance within each region (Bastos et al., 2012). Nevertheless, the same analysis pipeline could be applied if the interesting hypotheses were about between-region connections, or if they were about the average or baseline connectivity rather than task effects (parameter matrix of the neural model). If in doubt, the Bayesian model comparison procedure described here could be used to decide which parameters to modulate by the task (by comparing a full PEB model against reduced models), as well as which parameters to treat as random effects (by comparing PEB models with different within-subject design matrices).
Hierarchical Bayesian modelling is an established approach in the context of functional imaging data (Woolrich et al., 2004; Penny et al., 2005). The PEB framework (Friston et al., 2016) introduced hierarchical modelling of the connectivity parameters of DCMs, thereby linking individuals' connection strengths to the group(s) from which they were sampled. This has so far proved useful for modelling fMRI data (Dijkstra et al., 2017; Park et al., 2017; Jung et al., 2018; Zhou et al., 2018a, 2018b) and MEG/EEG/LFP data (Pinotsis et al., 2016; Fardo et al., 2017; Papadopoulou et al., 2017; Penny et al., 2018). The PEB framework has been validated in terms of reproducibility in the context of ERP data (Litvak et al., 2015) and – going forward – will benefit from further validation to assess the suitability of certain priors in novel applications. While this study has demonstrated the main features of the framework, there are others we did not cover for brevity. For instance, to pull subjects’ parameter estimates out of local minima, the first level model estimation can be re-initialized multiple times, using the group-level posteriors from the PEB model as empirical priors for each subject (Friston et al., 2015). This may be particularly useful where highly non-linear DCMs are in play. Furthermore, PEB analyses do not need to be limited to two levels. In a recent study investigating changes in the motor system following thalamic surgery for tremor, resting state fMRI data from multiple time points were modelled using separate DCMs (Park et al., 2017). The DCM parameters from each time point were collated into a subject-specific PEB model, forming the second level of the hierarchy. The second level parameters were then taken up to the group level, modelled using a third-level GLM, to capture commonalities and differences among subjects. By using this PEB-of-PEBs approach, the deep hierarchies implicit in certain kinds of experimental designs can be modelled explicitly.
Here, we presented two methods for testing hypotheses at the group level – either designing a set of models/families explicitly, or automatically searching over potentially thousands of reduced models to ‘prune’ parameters that did not contribute to model evidence. While the two approaches gave similar results on the dataset presented here, this is not guaranteed to be the case in general, and there are disadvantages to using an automatic (greedy) search. The large number of possible models means the search cannot usually be exhaustive, and so a better solution may be found by a well thought-out set of models. Furthermore, an automatic model search introduces the temptation to construct post-hoc explanations for the surviving parameters. The number of possible reduced models might emphasise the utility of a priori hypotheses when using Bayesian model reduction.
Specifying the PEB model in this study involved setting the between-subjects design matrix a priori. A complementary application of hierarchical modelling would be to discover group membership; i.e., to perform unsupervised classification or clustering of subjects based on their connectivity. A recently developed approach for this combines DCM with finite mixture models in a single hierarchical model (Raman et al., 2016). A complementary approach is possible with the PEB scheme, by searching over candidate group assignments using Bayesian Model Reduction (BMR). It would be interesting to compare this against sampling approaches, which are generally slower but constitute the gold standard in terms of robustness.
As touched on in the introduction, the analysis workflow presented here differs from that of the random effects Bayesian model selection (RFX BMS) method of group analysis (Stephan et al., 2009). With RFX BMS, multiple DCMs are specified and estimated per subject, and their log model evidence (free energy) is used as the basis for computing a probability for each model. There is no restriction on the form of the models that can be compared, provided that a free energy can be computed. With the PEB scheme, only one DCM is specified and estimated per subject, and hypotheses are tested about the commonalities and differences in the parameters (connection strengths) across subjects. The PEB workflow was not designed for comparing different forms of DCM forward model per subject, e.g. to compare a linear versus nonlinear model of neural activity. (A workaround could found, however, by defining separate PEB models for each type of DCM and comparing their evidence). Despite this restriction, the PEB workflow introduces potential advantages that are still to be fully assessed. For instance, because a full posterior probability density over the DCM parameters is taken to the group level, subjects with noisier data should automatically contribute less to the group result. Furthermore, we speculate that estimating only a ‘full’ model with all parameters in play will typically engender a ‘smoother’ free energy landscape than multiple reduced models per subject with fewer parameters. In other words, having a high dimensional parameter space could, counterintuitively, ensure more robust convergence; in the sense that there are more opportunities to escape from local minima in high dimensional parameter spaces. This may be particularly prescient for electrophysiological DCMs, where the models are more non-linear.
To get started with the PEB framework, we refer readers to the illustrated step-by-step guide and example data accompanying this article at https://github.com/pzeidman/dcm-peb-example.
Acknowledgments
The Wellcome Centre for Human Neuroimaging is supported by core funding from Wellcome [203147/Z/16/Z]. We thank Wiktor Olszowy for helpful feedback on the manuscript.
Footnotes
Vectors are denoted by lower case letters in bold italics ( and matrices by upper case letters in bold italics (. Other variables and function names are written in plain italics (). The dot symbol on its own means multiplication and when positioned above a variable (e.g. ) denotes the derivative of a variable with respect to time. An element in row and column of matrix is denoted by . All variables and their dimensions are listed in Table 2.
In the DCM/PEB framework, candidate models are specified by choosing which parameters should be ‘switched on’ and ‘switched off’. A parameter is switched on by allowing it to be informed by the data – i.e. assigning a prior probability density with positive variance (uncertainty). A parameter is switched off by fixing it to a default value (typically zero), by assigning a prior probability density with very low variance (high certainty).
Appendix 1. PEB connectivity priors specification
In the DCM/PEB framework, the first-level models have priors defined by a multivariate normal density , which are assumed to be the same for each subject. The second-level model is a Bayesian GLM with parameters . For convenience, the priors on these parameters are derived from the first level priors. By default, the second level priors are identical to those of the first level, except the prior variance is adjusted based on the scaling of the design matrix . The norm of the columns of between-subjects design matrix is calculated as follows:
| (8) |
Where function is the sum-of-squares and is the number of subjects. The second level prior is defined according to:
| (9) |
This specification has the effect of scaling the prior variance of all second level parameters relating to a particular covariate by the norm of that covariate. Here, we set the fixed parameter to one, meaning that we expected the effect sizes at the second level to be the same as the first level (aside from the scaling of the regressors).
Appendix 2. PEB random effects specification
Any differences between subjects not captured by the second-level Bayesian GLM are defined as additive noise or random effects (RFX) :
| (10) |
This is zero-mean additive noise with precision matrix (inverse of the covariance matrix) . The second line of Equation (10) specifies the precision matrix as a weighted combination of precision components. The Kronecker product replicates across the subjects and is the identity matrix. Precision component matrix is the minimum allowable precision for each parameter and each log scaling parameter multiplies up or down the corresponding precision component . These parameters are estimated from the data with prior . Taking their exponential guarantees that the precision is positive.
Specifying this model requires selecting the precision components . The PEB software provides several choices:
-
•
A single precision component
-
•
One component per field; i.e., one component for parameters from matrix of the DCM neural model, one component for parameters from matrix , etc.
-
•
One component per DCM connectivity parameter
-
•
Custom allocation of parameters to components
We selected the third option, one component per connection – meaning that the between-subject variability for each connection was estimated individually. The form of these components is illustrated in Fig. 9. Each component is a matrix with a single non-zero entry, which for convenience is set as a function of the corresponding first level DCM parameter's variance:
| (11) |
Where is the prior variance of any single subject's DCM parameter and fixed parameter . This equation says that we expect the between-subject variance to be smaller than the within-subject effect size; i.e. the expected between-subject standard deviation is a quarter of the prior standard deviation of the corresponding parameter. This means that the default between-subject precision for each parameter encoded on the leading diagonal of is .
Fig. 9.
The form of the precision components. Each component has a single non-zero entry at element , which is set to 16. These are scaled by the corresponding log scaling parameters . See Equation (10) for further details.
Appendix 3. Bayesian Model Comparison and Reduction
Hypotheses are tested in PEB by comparing the evidence for different models, which is called Bayesian model comparison or selection (c.f. Penny, 2012). The relative evidence for model relative to model is given by the Bayes factor in favour of model 1:
| (12) |
In practice, the model evidence cannot be computed exactly. In the DCM/PEB framework it is approximated by the (negative) variational free energy, which to recapitulate Equation (7) is:
| (13) |
Having computed the free energy for two models and , it is usually more convenient to work with the log of the Bayes factor:
| (14) |
This follows because division becomes subtraction when the values are logs. Sometimes the log Bayes factor is reported directly, or alternatively it can be expressed as the posterior probability for one model over another. This is given by Bayes rule under equal priors for each model:
| (15) |
The final line is simply a sigmoid function of the log Bayes factor. For example, a Bayes factor of 20 in favour of model 1 (20 times greater evidence for model 1 than model 2) is the same as a log Bayes factor of , which is equivalent to posterior probability . This value is typically used as the threshold for ‘strong evidence’ in favour of model 1 (Kass and Raftery, 1995). This approach can easily be generalized to the case where there are more than two models, by computing the log Bayes factor of each model relative to the model with the lowest evidence.
Bayesian model comparison is predicated on computing the free energy for each model. In the setting where a ‘full’ model has been estimated, with all parameters of interest informed by the data, Bayesian Model Reduction (BMR) enables the free energy and parameters for ‘reduced’ models to be computed analytically under certain mild assumptions (Friston and Penny, 2011; Friston et al., 2016). The difference between a full and a reduced model is the priors: for example, in a reduced model, a certain subset of connections may be switched off (i.e., fixed at zero). This enables large numbers of models to be evaluated in seconds. BMR is used extensively in the PEB framework (spm_log_evidence_reduce.m), for example to search over potentially thousands of reduced models.
One application of BMR – used in this paper – was to threshold the parameters of a PEB model based on the free energy (e.g. Fig. 5H). This involved, for each parameter , performing a Bayesian model comparison of the PEB model with parameter ‘switched on’ (free to vary) versus the equivalent PEB model with parameter ‘switched off’ (fixed at its prior expectation of zero). The difference in evidence can then be converted to a posterior probability using Equation (15) – and it is common to focus discussion on the parameters that exceed a certain posterior probability threshold. The advantage of thresholding using this approach – rather than just selecting parameters that have marginal posteriors (error bars) that deviate from zero – is that the full covariance of the parameters is considered when computing a difference in free energy. A proper assessment is therefore made as to whether the parameter contributed to model evidence.
Appendix 4. Leave-one out cross-validation with PEB
The implementation of Leave-one out (LOO) cross-validation in the PEB framework (spm_dcm_loo.m) is most simply explained with an example. Consider a dataset with five subjects, each of whom has one parameter of interest in their first level DCM model. These subjects are drawn from two groups, where each group has mean parameter values 0.7 and 0.3. Following model estimation, their parameters have expected values: 0.7, 0.6, 0.8, 0.2 and 0.4 respectively. To create the training set, we leave out the first subject and specify a GLM on the remaining four:
| (16) |
The design matrix has two regressors – to model the group mean and the group difference. Calculating the parameters of this GLM, using the standard formula, gives:
| (17) |
This says that the average connectivity across subjects is value 0.5, plus or minus 0.2 depending on the subject's group. Now we have an estimated model based on the training data, we can predict the row of the design matrix for the left out subject , given their connection strength . Specifically, we know the first element of this subject's row in the design matrix, but not the second:
| (18) |
Where is the unknown group membership. Recall that the left out subject had connectivity parameter . We can construct a GLM for this subject as follows:
| (19) |
Here, the role of the design matrix and parameters has been switched – we know the parameters , and we want to infer the missing element of the design matrix . Solving this equation gives , which is the correct group membership assignment for this subject.
To apply this approach in the PEB framework, for predicting discrete or continuous variables, a group-level PEB model is fitted to the training data in the usual way, which includes all but the left-out subject . The hierarchical model over parameters is therefore:
| (20) |
Where subscript refers to training data. A separate PEB model is then fitted to the test data, which includes only the left out subject , configured as follows:
| (21) |
Where subscript indicates the test subject and is the index of the column in the design matrix we are trying to predict. The first two lines of Equation (21) recapitulate the definition of the second level model in the PEB framework. The third and fourth lines define the prior expectation and covariance respectively for the test subject's parameters. The prior expectation is a vector where each element equals the corresponding element in the subject's row in the original design matrix , except for the value to be predicted, which is set to zero. The prior variance (uncertainty) for the parameter to be predicted is set to a multiple of the regressor's variance in the training data. To improve stability of the model estimation, a positive additive constant is added to the prior variance of all parameters, which decreases over repeated estimations of the left-out subject's model (with values ).
Conflicts of interest
None.
References
- Bastos A.M., Usrey W.M., Adams R.A., Mangun G.R., Fries P., Friston K.J. Canonical microcircuits for predictive coding. Neuron. 2012;76:695–711. doi: 10.1016/j.neuron.2012.10.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dijkstra N., Zeidman P., Ondobaka S., van Gerven M.A.J., Friston K. Distinct top-down and bottom-up brain connectivity during visual perception and imagery. Sci. Rep. 2017;7:5677. doi: 10.1038/s41598-017-05888-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fardo F., Auksztulewicz R., Allen M., Dietz M.J., Roepstorff A., Friston K.J. Expectation violation and attention to pain jointly modulate neural gain in somatosensory cortex. Neuroimage. 2017;153:109–121. doi: 10.1016/j.neuroimage.2017.03.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Friston K., Penny W. Post hoc Bayesian model selection. Neuroimage. 2011;56:2089–2099. doi: 10.1016/j.neuroimage.2011.03.062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Friston K., Zeidman P., Litvak V. Empirical Bayes for dcm: a group inversion scheme. Front. Syst. Neurosci. 2015;9:164. doi: 10.3389/fnsys.2015.00164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Friston K., Mattout J., Trujillo-Barreto N., Ashburner J., Penny W. Variational free energy and the Laplace approximation. Neuroimage. 2007;34:220–234. doi: 10.1016/j.neuroimage.2006.08.035. [DOI] [PubMed] [Google Scholar]
- Friston K.J., Harrison L., Penny W. Dynamic causal modelling. Neuroimage. 2003;19:1273–1302. doi: 10.1016/s1053-8119(03)00202-7. [DOI] [PubMed] [Google Scholar]
- Friston K.J., Litvak V., Oswal A., Razi A., Stephan K.E., van Wijk B.C.M., Ziegler G., Zeidman P. Bayesian model reduction and empirical Bayes for group (DCM) studies. Neuroimage. 2016;128:413–431. doi: 10.1016/j.neuroimage.2015.11.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoeting J.A., Madigan D., Raftery A.E., Volinsky C.T. Bayesian model averaging: a tutorial. Stat. Sci. 1999:382–401. [Google Scholar]
- Jung K., Friston K.J., Pae C., Choi H.H., Tak S., Choi Y.K., Park B., Park C.A., Cheong C., Park H.J. Effective connectivity during working memory and resting states: a DCM study. Neuroimage. 2018;169:485–495. doi: 10.1016/j.neuroimage.2017.12.067. [DOI] [PubMed] [Google Scholar]
- Kass R.E., Raftery A.E. Bayes factors. J. Am. Stat. Assoc. 1995;90:773–795. [Google Scholar]
- Litvak V., Garrido M., Zeidman P., Friston K. Empirical Bayes for group (dcm) studies: a reproducibility study. Front. Hum. Neurosci. 2015;9:670. doi: 10.3389/fnhum.2015.00670. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Papadopoulou M., Cooray G., Rosch R., Moran R., Marinazzo D., Friston K. Dynamic causal modelling of seizure activity in a rat model. Neuroimage. 2017;146:518–532. doi: 10.1016/j.neuroimage.2016.08.062. [DOI] [PubMed] [Google Scholar]
- Park H.J., Pae C., Friston K., Jang C., Razi A., Zeidman P., Chang W.S., Chang J.W. Hierarchical dynamic causal modeling of resting-state fMRI reveals longitudinal changes in effective connectivity in the motor system after Thalamotomy for essential tremor. Front. Neurol. 2017;8:346. doi: 10.3389/fneur.2017.00346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Penny W., Mattout J., Trujillo-Barreto N. Elsevier; London: 2006. Bayesian Model Selection and Averaging. Statistical Parametric Mapping: the Analysis of Functional Brain Images. [Google Scholar]
- Penny W., Iglesias-Fuster J., Quiroz Y.T., Lopera F.J., Bobes M.A. Dynamic causal modeling of preclinical autosomal-dominant alzheimer's disease. J. Alzheimer's Dis. 2018;65(3):697–711. doi: 10.3233/JAD-170405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Penny W.D. Comparing dynamic causal models using AIC, BIC and free energy. Neuroimage. 2012;59:319–330. doi: 10.1016/j.neuroimage.2011.07.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Penny W.D., Trujillo-Barreto N.J., Friston K.J. Bayesian fMRI time series analysis with spatial priors. Neuroimage. 2005;24:350–362. doi: 10.1016/j.neuroimage.2004.08.034. [DOI] [PubMed] [Google Scholar]
- Penny W.D., Stephan K.E., Daunizeau J., Rosa M.J., Friston K.J., Schofield T.M., Leff A.P. Comparing families of dynamic causal models. PLoS Comput. Biol. 2010;6 doi: 10.1371/journal.pcbi.1000709. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pinotsis D.A., Perry G., Litvak V., Singh K.D., Friston K.J. Intersubject variability and induced gamma in the visual cortex: DCM with empirical Bayes and neural fields. Hum. Brain Mapp. 2016;37:4597–4614. doi: 10.1002/hbm.23331. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Raman S., Deserno L., Schlagenhauf F., Stephan K.E. A hierarchical model for integrating unsupervised generative embedding and empirical Bayes. J. Neurosci. Methods. 2016;269:6–20. doi: 10.1016/j.jneumeth.2016.04.022. [DOI] [PubMed] [Google Scholar]
- Rigoux L., Stephan K.E., Friston K.J., Daunizeau J. Bayesian model selection for group studies - revisited. Neuroimage. 2014;84:971–985. doi: 10.1016/j.neuroimage.2013.08.065. [DOI] [PubMed] [Google Scholar]
- Rosa M.J., Friston K., Penny W. Post-hoc selection of dynamic causal models. J. Neurosci. Methods. 2012;208:66–78. doi: 10.1016/j.jneumeth.2012.04.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Seghier M.L., Josse G., Leff A.P., Price C.J. Lateralization is predicted by reduced coupling from the left to right prefrontal cortex during semantic decisions on written words. Cerebr. Cortex. 2011;21:1519–1531. doi: 10.1093/cercor/bhq203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stephan K.E., Penny W.D., Daunizeau J., Moran R.J., Friston K.J. Bayesian model selection for group studies. Neuroimage. 2009;46:1004–1017. doi: 10.1016/j.neuroimage.2009.03.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Woolrich M.W., Behrens T.E., Beckmann C.F., Jenkinson M., Smith S.M. Multilevel linear modelling for FMRI group analysis using Bayesian inference. Neuroimage. 2004;21:1732–1747. doi: 10.1016/j.neuroimage.2003.12.023. [DOI] [PubMed] [Google Scholar]
- Zhou Y., Friston K.J., Zeidman P., Chen J., Li S., Razi A. The hierarchical organization of the default, dorsal attention and salience networks in adolescents and young adults. Cerebr. Cortex. 2018;28:726–737. doi: 10.1093/cercor/bhx307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou Y., Zeidman P., Wu S., Razi A., Chen C., Yang L., Zou J., Wang G., Wang H., Friston K.J. Altered intrinsic and extrinsic connectivity in schizophrenia. Neuroimage Clin. 2018;17:704–716. doi: 10.1016/j.nicl.2017.12.006. [DOI] [PMC free article] [PubMed] [Google Scholar]









