Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2026 Mar 29.
Published in final edited form as: Neuroimage. 2022 Aug 27;263:119595. doi: 10.1016/j.neuroimage.2022.119595

Mixtures of large-scale dynamic functional brain network modes

Chetan Gohil a,*,#, Evan Roberts a,#, Ryan Timms a,#, Alex Skates a, Cameron Higgins a, Andrew Quinn a, Usama Pervaiz b, Joost van Amersfoort c, Pascal Notin c, Yarin Gal c, Stanislaw Adaszewski d, Mark Woolrich a
PMCID: PMC7618940  EMSID: EMS212883  PMID: 36041643

Abstract

Accurate temporal modelling of functional brain networks is essential in the quest for understanding how such networks facilitate cognition. Researchers are beginning to adopt time-varying analyses for electrophysiological data that capture highly dynamic processes on the order of milliseconds. Typically, these approaches, such as clustering of functional connectivity profiles and Hidden Markov Modelling (HMM), assume mutual exclusivity of networks over time. Whilst a powerful constraint, this assumption may be compromising the ability of these approaches to describe the data effectively. Here, we propose a new generative model for functional connectivity as a time-varying linear mixture of spatially distributed statistical “modes”. The temporal evolution of this mixture is governed by a recurrent neural network, which enables the model to generate data with a rich temporal structure. We use a Bayesian framework known as amortised variational inference to learn model parameters from observed data. We call the approach DyNeMo (for Dynamic Network Modes), and show using simulations it outperforms the HMM when the assumption of mutual exclusivity is violated. In resting-state MEG, DyNeMo reveals a mixture of modes that activate on fast time scales of 100–150 ms, which is similar to state lifetimes found using an HMM. In task MEG data, DyNeMo finds modes with plausible, task-dependent evoked responses without any knowledge of the task timings. Overall, DyNeMo provides decompositions that are an approximate remapping of the HMM’s while showing improvements in overall explanatory power. However, the magnitude of the improvements suggests that the HMM’s assumption of mutual exclusivity can be reasonable in practice. Nonetheless, DyNeMo provides a flexible framework for implementing and assessing future modelling developments.

1. Introduction

Functional connectivity (FC, Friston, 1994) has traditionally been studied across the duration of an experiment, be it metabolic (e.g. Beckmann et al., 2005; Cole et al., 2014; Smith et al., 2009; Stevens, 2016) or electrophysiological in nature (e.g. Brookes et al., 2011b; De Pasquale et al., 2012; Hipp et al., 2012; Luckhoo et al., 2012). Such studies have shown that the brain forms well-defined spatio-temporal networks which are seen both in task (Quinn et al., 2018) and at rest (Engel et al., 2013). However, there is a growing body of evidence supporting the idea that these networks are transient (Baker et al., 2014; O’Neill et al., 2015; Vidaurre et al., 2016), and that they emerge and dissolve on sub-second time scales. It is now well established that the dynamics of these networks underpin healthy brain activity and cognition (Fries, 2015) and that the disruption of FC is implicated in disease (Stam et al., 2009; Stoffers et al., 2008).

A systematic understanding of the neuroscientific significance of these networks of whole-brain activity is only facilitated by accurate modelling across the spatial, temporal and spectral domains. Sliding window analyses have been used successfully to study time-varying FC in both M/EEG (Betti et al., 2013; Brookes et al., 2011a; 2014; Brovelli et al., 2017; Carbo et al., 2017; De Pasquale et al., 2010; 2016; O’Neill et al., 2015; 2017) and fMRI (Allen et al., 2014; Chang et al., 2013; Elton and Gao, 2015; Hutchison et al., 2013; Kucyi and Davis, 2014; Liégeois et al., 2016; Lindquist et al., 2014; Preti et al., 2017; Sakoglu et al., 2010; Tagliazucchi et al., 2012). Recent studies have calculated very short, or even instantaneous, time-point-by-time-point estimates of FC, which are then combined with a second stage of clustering such as k-means (e.g. O’Neill et al., 2015) to pool over recurrent patterns of otherwise poorly estimated FC. These two-stage approaches allow access to FC on fast time scales (Sporns et al., 2021; Tewarie et al., 2019).

Although they remain popular, sliding window analyses are a heuristic approach to data analysis and lack a generative model. An alternative approach to studying dynamics of functional brain networks is via the adoption of a formal model. An Hidden Markov Model (HMM) (Rabiner and Juang, 1986) is one such option. As with the two-stage approaches mentioned above, HMMs can pool non-contiguous periods of data together to make robust estimations of the activity of brain networks, including FC. However, they do so by incorporating these two stages into one model. HMMs (as well as other techniques, such as microstates Michel and Koenig, 2018) have been used to show that brain networks evolve at faster time scales than previously suggested by competing techniques (such as independent component analysis) (Baker et al., 2014). In the context of M/EEG, HMMs have been used to elucidate transient brain states (Vidaurre et al., 2016), model sensor level fluctuations in covariance (Woolrich et al., 2013) and reveal latent task dynamics attributed to distributed brain regions (Quinn et al., 2018). More recently, Seedat et al. applied an HMM to detect transient bursting activity and showed it was correlated to aspects of the electrophysiological connectome (Seedat et al., 2020), whilst Higgins et al. were able to show that replay in humans coincides with activation of the default mode network (Higgins et al., 2021).

Although very powerful, convenient, and informative, traditional HMMs are themselves limited in two key ways. Firstly, there is the modelling choice that the state at any time point is only conditionally dependent on the state at the previous time point (i.e. the model is Markovian). This limits the modelling capability of the technique as there is no way for any long-range temporal dependencies between historic state occurrences and the current state to be established Gschwind et al. (2015). While approaches that use Hidden Semi-Markov Models have been proposed, they are limited in the complexity of long-range temporal dependencies they can capture (Trujillo-Barreto et al., 2019). Secondly, HMMs adopt a mutually exclusive state model, meaning that data can only be generated by one set of observation model parameters at any given instance. True brain dynamics might be better modelled by patterns that can flexibly combine and mix over time. The mutual exclusivity constraint was found to lead to errors in inferred functional brain network metrics in Pervaiz et al. (2022).

We set to address these two limitations in this paper and do so by introducing a new generative model for neuroimaging data. Specifically, we model the time-varying mean and covariance of the data as a linear weighted sum of spatially distributed patterns of activity or “modes”. Notably, we do not impose mutual exclusivity on mode activation. Similarly, we drop the assumption that the dynamics of the modes are a function of a Markovian process. This is achieved by using a unidirectional recurrent neural network (RNN) (Géron, 2019) to model the temporal evolution of the weighted sum. The memory provided by the RNN facilitates a richer context to the changes in the instantaneous mean and covariance than what would be afforded by a traditional HMM.

In this work, we use Bayesian methods (Friston et al., 2007) to infer the parameters of the generative model. With this method, we learn a distribution for each parameter, which allows us to incorporate uncertainty into our parameter estimates. Having observed data, we update the distributions to find likely parameters for the model to have generated the data. In this work, we adapt a method used in variational autoencoders (Kingma and Welling, 2014) to infer the model parameter distributions. One component of this is amortised inference, which works through the deployment of an inference network. In our case the inference network is another RNN, which is bidirectional (Géron, 2019) and learns a mapping from the observed data to the model parameter distributions. The use of an inference network facilitates the scaling and application of this technique to very large datasets, without ever needing (necessarily) to increase the number of inference network parameters to be learnt.

To update our model parameter distributions, we minimise the variational free energy (see Section 2.2) using stochastic gradient descent (Géron, 2019). We do this by sampling from the model parameter distributions using the reparameterization trick (Kingma and Welling, 2014). The ability to estimate the variational free energy by sampling enables us to use sophisticated generative models that include highly non-linear transformations that would not be feasible with classical Bayesian methods. Taken together, we call the generative model and inference framework DyNeMo (Dynamic Network Modes).

2. Methods

In this section we outline the generative model and describe the inference of model parameters. We also describe the datasets and preprocessing steps carried out in this work.

2.1. Generative model

Here we propose a model for generating neuroimaging data that explicitly models functional brain networks, including a metric of their FC, as a dynamic quantity. The model describes time series data using a set of modes, which are constituent elements that can be combined to define time-varying statistics of the data. When trained on neuroimaging data, modes are simply static spatial brain activity patterns that can overlap with each other. We refer to them as “modes” to emphasise that the model is not categorical, i.e. that modes should not be mistaken for mutually exclusive states (as would be the case in an HMM). Similar to an HMM, our generative model has two components: a latent representation and a data generating process given the latent representation, which is referred to as an observation model. In our case, the latent representation is a set of mixing coefficients αt and the observation model is a multivariate normal distribution. The mean and covariance of the multivariate normal distribution is determined by linearly mixing the modes’ spatial models, i.e. means µj and covariances Dj, with the coefficients αt. The mixing coefficients are dynamic in nature whereas the modes are static. Therefore, dynamics in the observed data are captured in the dynamics of the mixing coefficients. The mixing coefficients provide a low-dimensional and interpretable dynamic description of the data and modes correspond to static spatial distributions of activity/FC, where mode-specific FC is captured by the between-brain-region correlations in Dj. Both of these quantities are useful for understanding the data. An overview of the generative model is shown in Fig. 1 and a mathematical formulation is given below.

Fig. 1.

Fig. 1

Generative model employed in DyNeMo. Historic values of a latent logit time series (solid squares, blue background), θ<t, are fed into a unidirectional model RNN. The output of the model RNN parameterises a normal distribution, p(θt |θ<t), which we sample to predict the next logit, θt, (unfilled squares). These logits are transformed via a softmax operation to give the mixing coefficients, αt, (unfilled circles). The softmax transformation enforces the mixing coefficients are positive and sum to one at any instance in time. Separate from the dynamics are the corresponding spatial models that describe brain network activity as a set of modes (depicted in different colours here); via a mean vector, µj, and covariance matrix, Dj. The mode spatial models combine with the dynamic mixing coefficients (linear mixing) to parameterise a multivariate normal distribution with a time-varying mean vector, mt, and covariance matrix, Ct. Note, we do not enforce any constraint on the modes means µj and covariances Dj, this means they can overlap in time and space and the overall activity (mt and Ct) can vary. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

At each time point t there is a probabilistic vector of free parameters, referred to as a logit and denoted by θt. The logits are distributed in accordance with a multivariate normal distribution,

p(θtθ1:t1)=N(μθt(θ1:t1),σθt2(θ1:t1)), (1)

where θ1:t−1 denotes a sequence of historic logits {θ1, …, θt−1}, μθt is a mean vector and σθt2 is a diagonal covariance matrix. We use a unidirectional RNN to predict future values of μθt and σθt based on previous logits θ1:t−1. The logit at each time point θt is sampled from the distribution p(θt | θ1: t−1). The historic values of the logits θ1: −1 are fed into the RNN:

μθt(θ1:t1)=gμ(LSTM(θ1:t1)),σθt(θ1:t1)=ξ(gσ(LSTM(θ1:t1))), (2)

where gμ and gσ are learnt affine transformations, ξ is a softplus function included to ensure the standard deviations σθt are positive, and LSTM is a type of RNN known as a Long Short Term Memory network (). We refer to this network as the model RNN. The logits θt are used to determine a set of mixing coefficients,

αt=ζ(θt), (3)

where ζ is a softmax function which assures that the αt values are positive and sum to one.2 The mixing coefficients are then used together with a set of spatial modes to calculate a time-varying mean vector and covariance matrix:

mt=j=1Jαjtμj,Ct=j=1JαjtDj, (4)

where J is the number of modes, µj is the mean vector for each mode, Dj is the covariance matrix for each mode and αjt are the elements of αt.

2.2. Inference

In this section we describe the framework employed to infer the parameters of our generative model. Namely, the logits θt, mode means µj and covariances Dj. In this work, we use variational Bayesian inference to learn the full posterior distribution for θt and point estimates for µj and Dj.

Variational Bayes

In Bayesian inference, we would like to learn a distribution, referred to as the posterior distribution, for the variable we are trying to estimate given some data we have observed. In variational Bayesian inference, we approximate the posterior distribution with a simple distribution, referred to as the variational posterior distribution q(θt), and aim to minimise the Kullback-Leibler (KL) divergence between the variational and true posterior, which amounts to minimising the variational free energy (or equivalently, maximising the evidence lower bound). In classical variational Bayes (Bishop, 2007; Woolrich et al., 2009; Zhang et al., 2018), this involves formulating update rules for the parameters of the variational posterior distribution given some observed data. Deriving these update rules is only made possible by limiting the complexity of the generative model for the observed data and restricting the variational posterior to conjugate distributions. In addition to this, we have a separate variational distribution for each variable we are trying to estimate. Also in classical variational Bayes, we learn the parameters of each variational distribution separately, which becomes problematic in terms of computer memory requirements when we wish to estimate a large number of variables.

In brief, we overcome these difficulties with a technique adapted from variational autoencoders (Kingma and Welling, 2014). This deploys a neural network (which we call the inference network) to perform amortised inference, which helps the approach to scale to large numbers of observations over time; and a sampling technique (known as the reparameterization trick) that allows us to learn a full posterior distribution for θt (Kingma and Welling, 2014). We learn point estimates of µj and Dj using trainable free parameters. We update estimates for µj, Dj, and the posterior distribution parameters of θt, to minimise the variational free energy using stochastic gradient descent.

Logits θt

Focusing on the full posterior inference of the logits θt, here, we use amortised inference (Zhang et al., 2018). This involves using an inference network to learn a mapping from the observed data to the parameters of the variational posterior. The rationale for this approach is that the computation from past inferences can be reused in future inferences. The use of an inference network fixes the number of trainable parameters to the number of internal weights and biases in the inference network. This is usually significantly smaller than the number of time points, which allows us to efficiently scale to bigger datasets.

Inference network

We now describe the inference network in detail. Having observed the time series x1:N, we approximate the variational posterior distribution for θt as

q(θtx1:N)=N(mθt(x1:N),sθt2(x1:N)), (5)

where mθt and sθt2 are the variational posterior mean and covariance of a multivariate normal distribution respectively. The variational posterior covariance is a diagonal matrix. We use a bidirectional RNN for the inference network, which we refer to as the inference RNN. This network outputs the parameters of the variational posterior distribution given the observed data:

mθt(x1:N)=fm(BLSTM(x1:N))sθt(x1:N)=ξ(fs(BLSTM(x1:N))), (6)

where fm and fs are affine transformations and BLSTM denotes a bidirectional LSTM. The complete DyNeMo framework and interplay between the generative model and inference network is shown in Fig. 2.

Fig. 2.

Fig. 2

The full DyNeMo framework. A sequence of observed data, x1:N, is fed into a bidirectional RNN which parameterises the approximate variational posterior distribution for the logit time series, q (θt |x1:N). We sample θts from the variational posterior distribution using the reparameterization trick (asterisks, orange background) and feed the samples into the model RNN to predict the prior distribution one time step in the future p(θt +1 | θ1:t). The prior and posterior distribution are used to calculate the KL divergence term of the variational free energy. The samples from the variational posterior distribution θts are also used to generate the observed data by first applying a softmax transformation to calculate the mixing coefficients, αt, (unfilled circles, orange background). These mixing coefficients are then combined with the spatial model of each mode, which is a mean vector, µj, and covariance matrix, Dj. This gives an estimate of the time-varying mean, mt, and covariance, Dj, which is used to calculate the negative log-likelihood term of the variational free energy.

Loss function

Having outlined the inference network for the logits, we turn our attention to the loss function used in DyNeMo. In variational Bayesian inference we infer a parameter, in this case θt, by minimising the variational free energy (Friston et al., 2006),

F=q(θ1:Nx1:N)log(p(x1:Nθ1:N)p(θ1:N)q(θ1:Nx1:N))dθ1:N, (7)

where p(θ1: N) is the prior and p(x1: N | θ1: N) is the likelihood. With this approach the inference problem is cast as an optimisation problem, which can be efficiently solved with the use of stochastic gradient descent (Géron, 2019). Here, we make stochastic estimates of a loss function, and use the gradient of the loss function to update the trainable parameters in our model. However, to estimate the loss function we must calculate the integral in Eq. (7). In DyNeMo, this is done using a sampling technique (i.e. the reparameterization trick) to give Monte Carlo estimates of the loss function.

Insight into the loss function is gained by re-writing Eq. (7) as two terms (see SI 1.1):

F=LL+KL. (8)

The first term is referred to as the log-likelihood term and the second term is referred to as the KL divergence term. The log-likelihood term acts to give the most probable estimate for the logits that could generate the training data and the KL divergence term acts to regularise the estimate. Relating this to components of DyNeMo, it is the inference RNN that infers the logits, which together with the learnt mode means and covariances determine the log-likelihood term, whilst the model RNN regularises the inferred logits through its role as the prior in the KL divergence term. It is the temporal regularisation provided by the model RNN that distinguishes DyNeMo from a Gaussian mixture model (GMM). The benefit of including a model RNN for temporal regularisation is discussed in SI 1.4.

We now detail the calculation used to estimate the loss function. The log-likelihood term is given by

LL=t=1Nlog(p(xtθt1)), (9)

where p(xtθt1) is the likelihood of generating data xt at time point t given the latent variable is θt1, which is a sample from the variational posterior q(θt | x1: N). The supterscript in θt1 indicates that it is the first sample from q(θt | x1: N). Only one sample from the variational posterior at each time point is used to estimate the log-likelihood term. Note that the likelihood is a multivariate normal whose mean and covariance is determined by Eq. (4). Therefore, the likelihood depends on the logits θt, mode means µj and covariances Dj. The KL divergence term is given by

KL=t=2NDKL(q(θtx1:N)p(θtθ1:t11)), (10)

where p(θtθ1:t11) is the prior distribution for θt given a single sample for the previous logits θ11,,θt11 from their respective variational posteriors q(θ1 | x1:N), … q(θt−1 |x1:N) and DKL is the KL divergence (Bishop, 2007) between the variational posterior and prior. A full derivation of the loss function is given in SI 1.1.

Reparameterization trick

Next, we outline the method used to sample from the variational posterior distribution q(θt | x1: N). This is a multivariate normal distribution with mean vector mθt(x1:N) and diagonal covariance matrix sθt2(x1:N). To obtain a sample θts from q (θt | x1:N), we use the reparameterization trick (Kingma and Welling, 2014), where we sample from a normal distribution,

ϵN(0,I), (11)

where I is the identity matrix. ϵs denotes the sth sample from 𝒩 (0, I). We calculate the samples for the logits as

θts=mθt(x1:N)+sθt(x1:N)ϵs, (12)

where sθt(x1:N) is a vector containing the square root of the diagonal from sθt2(x1:N). The use of the reparameterization trick allows us to directly minimise the loss function using stochastic gradient descent.

Mode means µj and covariances

Dj Having detailed the inference of the logits θt and the calculation of the loss function, we now turn our attention to the spatial models described by the means µj and covariances Dj. We performed fully Bayesian inference on the logits, as they are temporally local parameters, and hence will have reasonably large amounts of uncertainty in their estimation which needs to be propagated to the inference of θt over time. By contrast, the mode means µj and covariances Dj are global parameters whose inference can draw on information over all time points. As a result we choose to use point estimates for µj and Dj, which are learnt using trainable free parameters. Additionally, learning point estimates when they are sufficient has the advantage of simplifying inference.

The time-varying mean vector mt constructed from the mode means µj can take on any value, and can therefore be treated as free parameters. However, the time-varying covariance C t constructed from the Dj matrices is required to be positive definite. We enforce this by parameterising the Dj ‘s using the Cholesky decomposition,

Dj=LjLj, (13)

where Lj is a lower triangular matrix known as a Cholesky factor and ′ denotes the matrix transpose. We learn Lj as a vector of free parameters that is used to fill a lower triangular matrix. We also apply a softplus operation and add a small positive value to the diagonal of the Cholesky factor to improve training stability. Using this approach, we learn point estimates for the mode means and covariances.

Hyperparameters, initialisation and training

The full DyNeMo model contains several hyperparameters, for example the number of layers and hidden units in the RNNs, the batch size, the learning rate, and many more. These all must be specified before training the model. DyNeMo also contains a large number of trainable parameters, which must be initialised. A description of the hyperparameters and the initialisation of trainable parameters is given in SI 1.2. Hyperparameters for each dataset used in this work are summarised in Table 1. There are also several techniques that can be used to improve model training, such as KL annealing (Bowman et al., 2015) and using multiple starts. These are also discussed in detail in SI 1.2.

Table 1. Hyperparameters (see SI 1.2) used in simulation and real data studies.
Hyperparameter Simulation 1 Simulation 2 MEG Data
Number of modes, J 3 6 10
Sequence length, N 200 200 200
Inference RNN hidden units 64 64 64
Model RNN hidden units 64 64 64
KL annealing sharpness, As 10 10 10
KL annealing epochs, nAE 100 100 300
Training epochs nE 200 200 600
Batch size 16 16 32
Learning rate, η 0.01 0.01 0.0025
Gradient clip (norm.) - - 0.5
Number of multi-starts - - 10
Multi-start epochs - - 20

2.3. Datasets

In this section, we describe the data used to train DyNeMo. This includes simulated data, described in Sections 2.3.1 and 2.3.2, which was used to evaluate DyNeMo’s modelling and inference capabilities, and real MEG data, described in Section 2.3.3, which was used for neuroscientific studies.

2.3.1. Simulation 1: Long-range dependencies

The first simulation dataset was used to examine DyNeMo’s ability to learn long-range temporal dependencies in the underlying logits. In simulation 1, data were generated using a Hidden Semi-Markov Model (HSMM) (Yu, 2010). Unlike an HMM, state lifetimes are explicitly modelled in an HSMM. This enables us to specify a lifetime distribution where long-lived states are probable. We train DyNeMo on this data and examine samples from the generative model, in this case we sample the model RNN. The lifetime distribution of the sampled states indicates the memory of the model RNN, i.e. the time scale of temporal dependencies it has learnt. If samples from DyNeMo show long-lived states that cannot be generated with an HMM, we say DyNeMo has learnt long-range temporal dependencies. In simulation 1, we used a gamma distribution (with shape and scale parameters of 5 and 10 respectively) to sample state lifetimes. We use a transition probability matrix with self-transitions excluded to determine the sequence of states to sample a lifetime for. The transition probability matrix and ground truth mode covariances are shown in Fig. 4a and b respectively. A multivariate time series with 11 channels, 25,600 samples and 3 hidden states was generated using an HSMM simulation with a multivariate normal observation model. A zero mean vector was used for each mode and co-variances were generated randomly. The ground truth state time course and lifetime distribution of this simulation is shown in Fig. 4c and d respectively.

Fig. 4.

Fig. 4

DyNeMo is able to learn long-range temporal dependencies in the latent dynamics of simulated data. Parameters of an HSMM simulation are shown along with the parameters inferred by DyNeMo and an HMM. While both DyNeMo and the HMM were able to accurately infer the hidden state time course and their lifetime distributions, actual samples from each model show that only DyNeMo was able to learn the lifetime distribution of the states within its generative model, demonstrating its ability to learn long-range temporal dependencies. a) Transition probability matrix used in the simulation. b) Covariances: simulated (top), inferred by DyNeMo (middle) and inferred by an HMM (bottom). c) State time courses: simulated (top), inferred by DyNeMo (middle) and inferred by an HMM (bottom). Each colour corresponds to a separate state. d) Lifetime distribution of inferred state time courses. e) Lifetime distribution of sampled state time courses. The fractional occupancy of each state is shown as a percentage in each histogram plot.

2.3.2. Simulation 2: Linear mode mixing

The second simulation dataset was used to examine DyNeMo’s ability to infer a linear mixture of co-activating modes. Here, we simulated a set of J sine waves with different amplitudes, frequencies and initial phases to represent the logits θt. We applied a softmax operation at each time point to calculate the ground truth mixing coefficients αt. A multivariate normal distribution with zero mean and randomly generated covariances was used for the observation model. A multivariate time series with 80 channels, 25,600 samples and 6 hidden modes was simulated. The first 2000 time points of the simulated logits and mixing coefficients are shown in Fig. 5a and b respectively.

Fig. 5.

Fig. 5

DyNeMo is able to accurately infer a linear mixture of modes. DyNeMo was trained on a simulation with co-activating modes. The mixing coefficients inferred by DyNeMo follow the same pattern as the ground truth. The failure of an HMM in modelling this type of simulation due to its inherent assumption of mutual exclusivity is also shown. a) Logits used to simulate the training data. b) Mixing coefficients of the simulation (top) and inferred by DyNeMo (bottom). c) State time course inferred by an HMM. d) Riemannian distance between the reconstruction of the time-varying covariance, C t, (via Eq. (4)) and the ground truth for DyNeMo and the HMM. Only the first 2000 time points are shown in each plot.

2.3.3. MEG data

In addition to the simulation datasets, we trained DyNeMo on two real MEG datasets: a resting-state and a (visuomotor) task dataset. The MEG datasets were source reconstructed to 42 regions of interest. The raw data, preprocessing and source reconstruction are described below.

Raw data and preprocessing

Data from the UK MEG Partnership were used in this study. The data were acquired using a 275-channel CTF MEG system operating in third-order synthetic gradiometry at a sampling frequency of 1.2 kHz. Structural MRI scans were acquired with a Phillips Achieva 7 T. MEG data were preprocessed using the OHBA software library (OSL). The time series was downsampled to 250 Hz before a notch filter at 50 Hz (and harmonics) was used to remove power line noise. The data were then bandpass filtered between 1 and 98 Hz. Finally, an automated bad segment detection algorithm in OSL was used to remove particularly noisy segments of the recording. No independent component analysis was applied to identify artefacts.

Source reconstruction

Structural data were coregistered with the MEG data using an iterative close-point algorithm; digitised head points acquired with a Polhemous pen were matched to individual subject’s scalp surfaces extracted with FSL’s BET tool (Jenkinson et al., 2005; Smith, 2002). We used the local spheres head model in this work Huang et al. (1999). Preprocessed sensor data were source reconstructed onto an 8 mm isotropic grid using a linearly constrained minimum variance beamformer (Van Veen and Buckley, 1988). Voxels were then parcellated into 42 anatomically defined regions of interest, before a time series for each parcel was extracted by applying Principal Component Analysis (PCA) to each region of interest. We use the same 42 regions of interest as Vidaurre et al. (2018), see the supplementary information of Vidaurre et al. (2018) for a list of the regions used and their MNI coordinates. Source reconstruction can lead to artefactual correlations between parcel time courses, referred to as source leakage. This is a static effect so it should not affect the inference of dynamics. However, it can affect the inferred FC. We minimise source leakage using the symmetric multivariate leakage reduction technique described in Colclough et al. (2015), which unlike pairwise methods has the benefit of reducing leakage caused by so-called ghost interactions (Palva et al., 2018). We will refer to each parcel as a channel.

Resting-state dataset

The resting-state dataset is formed from the MEG recordings of 55 healthy participants (mean age 38.3 years, maximum age 62 years, minimum age 19 years, 27 males, 50 right handed). The participants were asked to sit in the scanner with their eyes open while 10 min of data were recorded.

Task dataset

The task dataset is formed from MEG recordings of 51 healthy participants (mean age 38.4 years, maximum age 62 years, 24 males, 46 right handed). The recordings were taken while the participants performed a visuomotor task (Hunt et al., 2019). Participants were presented with a high-contrast grating (visual stimulus). The grating remained on screen for a jittered duration between 1.5 and 2 s. When the grating was removed, the participants performed an abduction using the index finger and thumb of the right hand. This abduction response was measured using an electromyograph on the back of the hand. From the grating removal, an 8 s inter trial interval is incorporated until the grating re-appeared on the screen. The structure of the task is shown in Fig. 3. A total of 1837 trials are contained in this dataset. The majority of participants in the UK MEG Partnership study have both resting-state and task recordings. 48 of the participants in the resting-state and task dataset are the same.

Fig. 3.

Fig. 3

The structure of the visuomotor task. Participants are presented with a visual stimulus, which is an onscreen grid. After a period of between 1.5 and 2 s, the grid is removed. Upon grid removal, the participant performs a right-hand index finger abduction. Between the removal of the grid and its reappearance for the next trial, there is an 8 s inter-trial interval.

Data preparation

Before training DyNeMo, we further prepare the preprocessed data by performing the following steps. The first step is used to encode spectral information into the observation model (see Figure S1), whereas the other two are to help train the model. These steps are optional and were only performed on the MEG datasets. The steps are:

  1. Time-delay embedding. This involves adding extra channels with time-lagged versions of the original data. We use 15 embeddings, which results in a total of 630 channels. By doing this, we introduce additional off-diagonal elements to the covariance matrix, which contains the covariance of a channel with a time-lagged version of itself. This element of the covariance matrix is the autocorrelation function of the channel for a given lag (Papoulis and Saunders, 1989). As the autocorrelation function captures the spectral properties of a signal, this allows the model to learn spectral features of the data as part of the covariance matrix.

  2. PCA. After time-delay embedding we are left with 630 channels. This is too much for modern GPUs to hold in memory. Therefore, we use PCA for dimensionality reduction down to 80 channels.

  3. Standardisation (z-transform) across the time dimension. This is a common transformation that has been found to be essential in many optimisation problems (Géron, 2019). Standardisation is the final step in preparing the training data.3

Time-delay embedding and PCA are summarised in Figure S1. We train DyNeMo to generate the prepared MEG data, i.e. the 80 channel time series after time-delay embedding and PCA, rather than the 42 channel time series of source reconstructed data.

2.4. Post-hoc analysis of learnt latent variables

In this work, we set each mode’s mean vector, µj, to zero and do not update its value during training. This is due to our choice of training data. In the simulation datasets, we simulated modes with a zero mean vector so there is no need to model the mean. In the MEG datasets, we train on time-delay embedded data. Here, we want all the spectral information to be contained in the mode covariance matrices, therefore we set the means to zero. Additionally, we would like to compare our results to those presented in Vidaurre et al. (2018), which trained an HMM without learning the mean. In this work, we use DyNeMo to learn the mixing coefficients, αt, (via the logits, θt) and the mode covariances, Dj.

DyNeMo provides a variational posterior distribution q(θt | x1: N) at each time point. To simplify analysis we take the most probable value for θt (this is known as the maximum a posteriori probability estimate) and use this to calculate the inferred mode mixing coefficients, αt, which contain a description of latent dynamics in the training data.4

We can use the inferred mode mixing coefficients to estimate quantities that characterise the training data. We describe such analyses in detail in SI 1.3. Quantities calculated in the post-hoc analyses include: summary statistics that characterise the temporal properties of each mode, such as activation lifetimes, interval times and fractional occupancies; power spectra that characterise the spectral properties of each mode and power/FC maps that characterise the spatial pattern of each mode. Note, we only use the the inferred mixing coefficients (and the source reconstructed data) in the post-hoc analysis, the mode covariances are not used.

3. Results

3.1. Simulation 1: Long-range dependencies

A simulation dataset was used to examine DyNeMo’s ability to learn long-range temporal dependencies. DyNeMo was trained on the simulation dataset described in Section 2.3.1. An HMM was also trained on the simulated data for comparison. In this simulation, a mutually exclusive hidden state was used to generate the training data. The ground truth hidden state time course is shown in Fig. 4c. DyNeMo was able to correctly infer mutually exclusive modes, which we can think of as states. The DyNeMo and HMM inferred state time courses are also shown in Fig. 4c. Both DyNeMo and the HMM are able to infer the presence of long-range dependencies by matching the ground truth, non-exponential, state lifetime distributions (shown in Fig. 4d). A dice coefficient (model inferred vs ground truth) of greater than 0.99 is achieved for both models. However, this does not mean that the HMM or DyNeMo generative models have necessarily learnt long-range dependencies, as the inferred state time courses could be a result of purely data-driven information. To test this, we can sample state time courses from the trained HMM and DyNeMo generative models and examine their life-time distributions. Fig. 4e shows the lifetime distribution sampled state time courses. The state lifetime distribution of the sample from DyNeMo captures the non-exponential ground truth distribution, demonstrating its ability to learn long-range temporal dependencies over the scale of at least 50 samples. Contrastingly, the HMM was not able to generate any long-range temporal dependencies, indicating that, as expected, it is only able to capture short-range dependencies.

3.2. Simulation 2: Linear mode mixing

In contrast to the mutual exclusivity assumption of the HMM, DyNeMo has the ability to infer a linear a mixture of modes. To test this we trained DyNeMo and the HMM for comparison on the simulation dataset described in Section 2.3.2. Fig. 5b shows the simulated mixing coefficients and those inferred by DyNeMo. For comparison, the state time course inferred by an HMM is also shown in Fig. 5c. As the HMM is a mutually exclusive state model, it is unable to infer a linear mixture of modes, whereas DyNeMos mixing coefficients estimate the ground truth very well, demonstrating its ability to learn a mixture of modes. Using the inferred mixing coefficients or state time course along with the inferred covariances, we can reconstruct the time-varying co-variance, C t, of the training data. The Riemannian distance between the reconstruction and ground truth is shown in Fig. 5d. The mean Rie-mannian distance for DyNeMo is 1.5, whereas it is 11.9 for the HMM. Using a paired t-test the difference is significant with a p-value < 10−5. The smaller Riemannian distance indicates DyNeMo is a more accurate model for the time-varying covariance.

3.3. Resting-State MEG data

DyNeMo identifies plausible resting-state networks

Fig. 6 shows the power maps, FC maps and power spectral densities (PSDs) of 10 modes inferred by DyNeMo when trained on the resting-state MEG dataset described in Section 2.3.3. For the PSDs, we plot the regression coefficients P j (f) to highlight differences relative to the mean PSD P 0(f) common to all modes. Mode 1 appears to be a low-power background network and does not show any large deviations in power from the mean PSD for any frequency. Modes 2–10 show high power localised to specific regions associated with functional activity (see Laird et al., 2011 for an overview of the functional association of different brain networks). Regions with high power also appear to have high FC. Modes 2 and 3 show power in regions associated with visual activity. Mode 4 shows power in parietal regions and can be associated with the posterior default mode network (see Fig. 11). Mode 5 shows power in the sensorimotor region. Modes 6–8 show power in auditory/language regions. Modes 2–8 show power in the alpha band (8–12 Hz) and modes 4–6 and 8 include power at higher frequencies in the beta band (15–30 Hz). Mode 9 shows power in fronto-parietal regions and is recognised as an executive control network. Mode 10 shows power in frontal regions which can be associated with the anterior default mode network. Modes 9 and 10 exhibit low-frequency oscillations in the delta/theta band (1–7 Hz). The PSD of each mode is consistent with the expected oscillations at the high-power regions in each mode (Capilla et al., 2022). A comparison with states inferred with this dataset using an HMM is presented in the section “Large-scale resting-state networks can be formed from a linear mixture of modes”.

Fig. 6.

Fig. 6

DyNeMo infers modes that form plausible resting-state MEG networks. Ten modes were inferred using resting-state MEG data from 55 subjects. Mode 1 appears to be a low-power background network, whereas modes 2–10 show high power in areas associated with functional networks. Modes are grouped in terms of their functional role. Each box shows the power map (left), FC map (middle) and PSD relative to the mean averaged over regions of interest (right) for each group. The top two views on the brain in the power map plots are lateral surfaces and the bottom two are medial surfaces. The shaded area in the PSD plots shows the standard error on the mean.

Fig. 11. HMM states can be represented as a linear mixture of modes.

Fig. 11

a) Correlation of HMM state time courses with DyNeMo mode mixing coefficient time courses. The dynamics of multiple mode time courses correlate with each HMM state time course. In particular, many modes co-activate with the posterior default mode network (DMN) state. All elements are significant with a p-value < 0.05. b) Percentage of HMM state power explained by each DyNeMo mode for the posterior and anterior DMN. This was calculated as < αjt > Tr(Dj)/Tr(Hi), where Dj (Hi) is the DyNeMo (HMM) covariance for mode j (state i) and < αjt > is the time average mixing coefficient for mode j when state i is active. This shows all modes contribute to some extent to the power in these HMM states. c) The cumulative explained power for each HMM state. The modes were re-ordered in terms of increasing contribution before calculating the cumulative sum. Error bars are too small to be seen.

Power maps are reproducible across two split-halves of the dataset

To assess the reproducibility of modes across datasets, we split the full dataset into two halves of 27 subjects. We assess the reproducibility of the modes across halves using the RV coefficient (Yang et al., 2008), which is a generalisation of the squared Pearson correlation coefficient. We match the modes across halves in a pairwise fashion using the RV coefficient as a measure of similarity. Fig. 7 shows the power maps of the matched modes. In general, the same regions are active in each pair of modes and the functional networks are reproducible across datasets. The main difference is small changes in how power is distributed across the visual network modes (mode 4) and across the temporal/frontal regions (mode 9).

Fig. 7.

Fig. 7

Power maps are reproducible across two split-halves of a dataset. Each half of the dataset contains the resting-state MEG data of 27 subjects. Power maps are shown for the the first half of the dataset (top) and second half of the dataset (middle). The RV coefficient of the inferred covariances from each half for a given mode (bottom) is also shown. The modes were matched in terms of their RV coefficient. Pairing the modes from each half we see the same functional networks are inferred. These networks also match the modes inferred on the full dataset of 55 subjects, suggesting these networks are reproducible across datasets. The top two views on the brain in each power map plot are lateral surfaces and the bottom two are medial surfaces.

Mode activations are anti-correlated with a background mode and modes with activity in similar regions co-activate

A subset of the inferred mixing coefficients is shown in Fig. 8. Fig. 8a shows the raw mixing coefficients inferred directly from DyNeMo. However, these mixing coefficients do not account for a difference in the relative magnitude of each mode co-variance. For example, a mode with a small mixing coefficient may still be a large contributor to the time-varying covariance if the magnitude of its mode covariance is large. We can account for this by obtaining a weighted mixing coefficient mode time course by multiplying the raw mixing coefficients with the trace of its mode covariance. We also normalise the weighted mixing coefficient time course by dividing by the sum over all modes at each time point to maintain the sum-to-one constraint. Fig. 8b and c show these normalised weighted mixing coefficients. Once we account for the magnitude of the mode covariances, we see each mode’s contribution to the time-varying covariance is roughly equal. We show the state time course inferred by an HMM in Fig. 8d for comparison. Fig. 8e shows the correlation between the raw mixing coefficients αjt for each mode. Modes 2–10 appear to be anti-correlated with mode 1. This arises due to the softmax operation (Eq. (23 in SI 1.2) that constrains the mixing coefficients to sum to one. For a mode to activate by contributing more to the time-varying covariance, another mode’s contribution must decrease. The anti-correlation of mode 1 with every other mode suggests that it is primarily this mode’s contribution that is decreased. This suggests that mode 1 can be thought of as a background mode that is deactivated by the other modes.

Fig. 8. DyNeMo provides a mode description of resting-state MEG data.

Fig. 8

a) Raw mixing coefficients αjt inferred by DyNeMo for one subject. b) Mixing coefficients αjt weighted by the trace of each mode covariance and normalised to sum to one at each time point. c) Zoomed in normalised weighted mixing coefficients αjtNW for the first 5 s. d) HMM state time course for the first 5 s for comparison. The power/FC maps and PSDs for the HMM states are shown in Figure S7. e) Correlation between the raw mixing coefficients αjt for different modes j. Ordering is the same as Fig. 6. We see DyNeMo’s description of the data is a set of co-existing modes whose contribution to the time-varying covariance fluctuates. Once weighted by the covariance matrices we see each mode has a more equal contribution. We also see modes 2–10 are anti-correlated with the mode 1 and modes with activation in similar regions, e.g. modes 2, 3 and 4, are correlated.

DyNeMo reveals short-lived (100–150 ms) mode activations

Using a GMM to define when a mode is active we calculate summary statistics such as lifetimes, intervals and fractional occupancies. Mode activation time courses and summary statistics are shown in Fig. 9. Mode 1 appears to have long activation lifetimes and a high fractional occupancy, which is consistent with the description of it being a background network that is largely present throughout. Modes 2–10 have mean lifetimes approximately over the range 100–150 ms, which is slightly longer than the state lifetimes obtained from an HMM, which are over the range 50–100 ms (Vidaurre et al., 2018). Both models reveal transient networks with lifetimes on the order of 100 ms, suggesting that this is a plausible time scale for these functional networks in resting-state MEG data, confirming that the short lifetimes previously found by the HMM are not likely to be caused by the mutual exclusivity assumption.

Fig. 9. DyNeMo reveals short-lived mode activations with lifetimes of 100–150 ms.

Fig. 9

a) Mode activation time courses. Turquoise regions show when a mode is “active”. Only the first 5 s of each mode activation time course for the first subject is shown. b) GMM fits used to identify mode “activations”. Distribution over activations and subjects of c) mode activation lifetimes and d) intervals. e) Distribution over subjects of fractional occupancies. We see mode 1 has a significantly longer mean lifetime (approximately 400 ms) compared to the other modes (approximately 100–150 ms). There is also a wide distribution of fractional occupancies across subjects.

DyNeMo learns long-range temporal correlations

Latent temporal correlations in MEG data can be seen by examining the inferred mixing coefficients, which are shown in Fig. 8. A process is considered to possess long-range temporal correlations if its autocorrelation function decays sufficiently slowly (usually measured relative to an exponential decay) (Linkenkaer-Hansen et al., 2001; Meisel et al., 2017). The autocorrelation function and PSD form a Fourier transform pair, therefore, we can examine the presence of long-range temporal correlations by looking at the PSD. Fig. 10b (top left) shows the PSD of the inferred mixing co-efficients. The PSDs are rapidly decaying with a 1/f-like spectrum. This indicates the autocorrelation function must have a slow decay, suggesting the presence of long-range temporal correlations. As in Section 3.1, this does not mean that DyNeMo’s generative model has necessarily learnt long-range dependencies, as the presence of long-range temporal correlations could be a result of purely data-driven information. We can examine if the generative model in DyNeMo was able to learn these long-range temporal correlations by sampling a mixing coefficient time course from the model RNN. Fig. 10a shows a sampled mixing coefficient time course. The PSD of the mixing coefficient time course sampled from the model RNN, Fig. 10b (bottom left), shows the same 1/f-like spectrum as the inferred mixing coefficient time course, demonstrating it was able to learn long-range temporal correlations in the data. This is in contrast to an HMM, where the PSD of the inferred state time course, Fig. 10b (top right), shows long-range temporal correlations, but the PSD of a sampled state time course, Fig. 10b (bottom right), does not. It is also worth noting that the inferred long-range temporal correlations for the HMM are also less strong than for DyNeMo. This implies that the DyNeMo inferred long-range temporal correlations are not purely data driven, but also come from knowledge about long-range temporal correlations captured by DyNeMo through gathering information across the whole dataset. Note, although the HMM was not able to learn long-range temporal correlations, it was still able to infer them. This is because the inference depends on both the model and the data. Despite the limited memory in the HMM, there is sufficient information coming from the data to infer long-range temporal correlations in the states.

Fig. 10. DyNeMo learns long-range temporal correlations in resting-state MEG data.

Fig. 10

a) Normalised weighted mixing coefficients sampled from the DyNeMo model RNN trained on resting-state MEG data. b) PSD of the sampled and inferred normalised weighted mixing coefficients from DyNeMo and sampled and inferred state time courses from an HMM. The red dashed line in b) shows statistically significant frequencies (p-value < 0.05) when comparing the inferred time courses with a sample from the HMM using a paired t-test. The mixing coefficient time course sampled from the DyNeMo model RNN resembles the inferred mixing coefficient time course and shows a similar PSD. Contrastingly, the sampled state time course from an HMM does not have the same temporal correlations as the inferred state time course, which is demonstrated by the flat PSD for the sample. Each mixing coefficient time course was standardised (z-transformed) across the time dimension before calculating the PSD. The fractional occupancy in a 200 ms window was used to calculate the PSD of the HMM state time courses, see Baker et al. (2014). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Large-scale resting-state networks can be formed from a linear mixture of modes

The mixture model in DyNeMo allows it to construct large-scale patterns of covariance using a combination of modes with localised activity. This can be seen by comparing the modes inferred by DyNeMo with states that reveal large-scale networks inferred by an HMM. An HMM was trained on the same resting-state dataset. Power maps, FC maps and PSDs of the HMM states are shown in Fig. S7. Two important networks identified by the HMM are the anterior and posterior default mode networks (states 1 and 2). The power map for DyNeMo mode 10 (see Fig. 6) resembles the anterior state, however, there is no single mode that resembles the posterior state. Fig. 11a shows the correlation of HMM state time courses with DyNeMo mode mixing coefficient time courses. We can see the modes that are correlated most with a state time course have activity in similar locations. Focusing on the default mode network states, DyNeMo mode 4 is the most correlated the posterior state and mode 10 is most correlated with the anterior state. In Vidaurre et al. (2018), it was shown that the default mode networks states have a high power in the alpha band for the posterior state and in the delta/theta band for the anterior state. The PSDs of the modes 4 and 10 also show this, providing further evidence that these modes are an alternative perspective on these states. The contribution of each mode to the default mode network HMM states is shown in Fig. 11b. This shows the ratio of the total power in a mode relative to the total power in an HMM state. We can see that the power in the default mode network states is explained by many modes, i.e. DyNeMo has found a representation of these states that combines many modes. This is also true for the other HMM states. Fig. 11 shows the fraction of power explained by a certain number of modes for each HMM state. The fraction of power explained increases monotonically with number of modes with no one particular mode explaining a large fraction of power. The mode description provided by DyNeMo appears to be fundamentally different to the HMM, no segments of time where one mode dominates are found. Instead, it is a representation where multiple modes co-exist and dynamics are captured by changes in the relative activation of each mode.

3.4. Task MEG data

Resting-state networks are recruited in task

The power maps, FC maps and PSDs of 10 modes inferred by DyNeMo trained from scratch on the task MEG dataset described in Section 2.3.3 are shown in Fig. 12. Very similar functional networks are found in task and resting-state MEG data (see Section 3.3). The main difference between the resting-state and task power maps is that the sensorimotor network has split into two asymmetric modes. This could be due to the more frequent activation of this area in the task dataset, which incentivizes the model to infer modes that best describe power at this location.

Fig. 12.

Fig. 12

Resting-state networks are recruited in task. Ten modes were inferred using task MEG data from 51 subjects. Very similar functional networks are inferred as the resting-state data fit shown in Fig. 6. Modes are grouped in terms of their functional role. Each box shows the power map (left), FC map (middle) and PSD relative to the mean averaged over regions of interest (right) for each group. The top two views on the brain in the power map plots are lateral surfaces and the bottom two are medial surfaces. The shaded area in the PSD plots shows the standard error on the mean.

Modes show an evoked response to task

When the inferred mixing co-efficient time courses are epoched around task events, an evoked response is seen. With the window around the presentation of the visual stimulus (Fig. 13a, left), DyNeMo shows a strong activation in mode 2 which corresponds to activity in the visual cortex. It also shows smaller peaks in modes 4 (posterior default mode network) and 8 (auditory/language) followed by another larger peak in mode 9 (fronto-parietal network). These represent neural activity moving from the visual cortex to a broader posterior activation and finally to an anterior activation. With the window around the abduction event (Fig. 13a, right), DyNeMo shows a strong peak in mode 5, which corresponds to activity in the motor cortex. This is accompanied by a broader suppression of mode 4 which represents the posterior default mode network. The presence of task-related activations in the mixing coefficient time courses when DyNeMo is unaware of the task structure of the data demonstrates its ability to learn modes that are descriptive of underlying brain activity.

Fig. 13. A consistent task-dependent response to the visuomotor task is seen for a number of modes.

Fig. 13

a) Trial-averaged mode timecourses weighted by the trace of their mode covariances epoched around the visual (left) and abduction (right) task. The red background shows significant time points (p-value < 0.05) calculated using a sign-flip permutation t-test with the family-wise error rate being controlled by using the maximum statistic. b) Individual trial responses (mode mixing coefficients weighted by the trace of their covariance) for mode 2 (visual, left) and mode 5 (sensorimotor, right). The visual stimulus/abduction task occurs at Time = 0 s. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

When considering the individual trials, rather than the average response across trials, we see that the visual mode is consistently activated when the visual stimulus is presented (Fig. 13b, left) and the sensorimotor mode is consistently activated when the abduction occurs (Fig. 13b, right), which suggests the evoked response is not just an aggregated effect. An HMM trained on the same dataset also shows trial-wise activation (Figure S10), although the binary nature of its state activations means that the contribution of a given state can be either wiped out by another state or falsely activated by reduced activity elsewhere.

DyNeMo avoids this by allowing a mixture of states to be active at a given time.

DyNeMo is a more accurate model of dynamic spectral properties compared to an HMM

Epoching the spectrogram of the source reconstructed data we can see the evoked response to task as a function of frequency (Fig. 14). For the visual task (Fig. 14a, left), immediately after the stimulus we can see a sharp increase in power around 5 Hz followed by a reduction in power around 10 Hz and above. This is repeated again around 2 s into the epoch, which is when the visual stimulus is removed. For the abduction task (Fig. 14a, right), immediately after the task we also see a sharp increase in power at 5 Hz followed by a reduction in power at 10 Hz and above. However, this is followed by an increase in power at 10 Hz and above, commonly known as a post-movement beta rebound (Jurkiewicz et al., 2006; Salmelin et al., 1995). We can reconstruct a model estimate for the spectrogram of the data from a DyNeMo (HMM) fit by multiplying the inferred mode (state) time course by the estimate of the mode (state) PSD. Model estimate spectrograms are shown for DyNeMo and the HMM in Fig. 14b and c respectively, along with their reconstruction errors (i.e. the residual, ϵt(f), in Eq. (26) in SI 1.3). The absolute value of the reconstruction error averaged over frequency for DyNeMo and the HMM is shown in Fig. 14d. Both DyNeMo and the HMM are able to model dynamics in spectral content of the data, however, DyNeMo shows a modest improvement in the time-averaged reconstruction error of 5.0% (4.0%) for the visual (abduction) task compared to 5.2% (4.7%) for the HMM. A paired t-test shows the difference between the DyNeMo and HMM reconstruction error is significant with a p-value < 0.01.

Fig. 14. DyNeMo is a more accurate model of spectral properties compared to an HMM.

Fig. 14

a) Spectrogram of the source reconstructed data epoched around the visual and abduction task. The spectrogram was baseline corrected by subtracting the mean for the duration before the task (for each frequency separately). b) The DyNeMo model reconstruction of the spectrogram epoched around the visual and abduction task (left) and the difference from the spectrogram of the source reconstructed data (right). c) The HMM reconstruction of the spectrogram epoched around the visual and abduction task (left) and the difference from the spectrogram of the source reconstructed data (right). The spectrogram of the data and reconstruction from both models have been normalised to the range -1 to 1. The average spectrogram across all channels is shown. d) Absolute value of the reconstruction error for DyNeMo and the HMM averaged across frequencies for the visual (left) and abduction task (right). The reconstruction error is expressed as a percentage of power at each time point calculated by averaging the spectrograms in (a) over frequency. DyNeMo shows a smaller error in reconstructing the data spectrogram compared to the HMM, indicating it is a more accurate model of spectral properties.

4. Discussion

We have shown that MEG data can be described using multiple modes of spatiotemporal patterns that form large-scale brain networks (Figs. 6 and 12). Recently, other models that provide a mode description of neuroimaging data have been proposed. Ponce-Alvarez et al. and Tewarie et al. used non-negative tensor factorisation to identify dynamic overlapping spatial patterns of connectivity (Ponce-Alvarez et al., 2015; Tewarie et al., 2019). Núñez et al. used community detection on a time series of FC matrices to identify repeated patterns of connectivity (Núñez et al., 2021). Atasoy et al. propose ‘connectome harmonics’, where an eigendecomposition of the Laplacian of a structural connectivity matrix is calculated, which results in a set of harmonic modes that represent spatial patterns of connectivity (Atasoy et al., 2016). Atasoy et al. showed that these modes predict resting-state networks (Atasoy et al., 2016). Glomb et al. and Rué-Queralt et al. used the modes as a basis set to obtain a spatiotemporal description of EEG data, which revealed fast dynamics (Glomb et al., 2020; Rue-Quéralt et al., 2021). Although, these technique provide a dynamic description of the data using a set of overlapping spatial modes, they all lack a generative model. Furthermore, connectome harmonics are determined from the structural connectivity matrix. In DyNeMo, a mode description of the FC is learnt directly from the data (see Section 2).

The modes inferred by DyNeMo have distinct spectral properties and correspond to plausible FC systems, such as visual, sensorimotor, auditory or other higher-order cognitive activity. These modes are more localised and can be more lateralized than the spatial patterns attributed with HMM states. Previous analysis of resting-state MEG data using an HMM (Vidaurre et al., 2018) was able to identify large-scale transient networks which exist on time scales between 50 and 100 ms. We find DyNeMo infers transient networks at similar time scales of 100–150 ms (Fig. 9). The implies the fast dynamics inferred by an HMM are not due to the assumption of mutually exclusive states.

An HMM trained on the resting-state MEG dataset used in this work suggested the default mode network was split into an anterior and posterior component (Vidaurre et al., 2018). In DyNeMo, the default mode network is further split up into many modes that combine to represent this network (Fig. 11b). The modes that represent the default mode net-work show power in the same regions and frequency bands as the HMM states, supporting the fact that the modes represent an alternative perspective on the data.

Training DyNeMo on task MEG data, we find similar functional networks as inferred with resting-state data (Fig. 12). This finding is supported in literature for other neuroimaging modalities, where the same networks are found in resting-state and task fMRI data (Smith et al., 2009). The similarity in the functional networks could also be due to the fact that the majority of the subjects in the task dataset are also present in the resting-state dataset.

In an unsupervised fashion, DyNeMo was able to infer modes associated with the task. This is seen as an evoked response in the mixing coefficients of a mode to a task (Fig. 14). This demonstrates that the modes inferred by DyNeMo meaningfully represent brain activity. The modes also reflect the expected time-frequency response to visual and motor tasks, which builds confidence in the description provided by DyNeMo. We find DyNeMo provides a more accurate model compared to an HMM of time-varying spectral features in the training data (Fig. 14). However, both DyNeMo and the HMM show errors in modelling high-frequency spectral content in the task MEG dataset. We believe this arises from the PCA step in the data preparation, which retains components that explain large amounts of variance. In this data, lower frequencies have larger amplitudes and are able to explain more variance than high frequencies with smaller amplitudes, leading to high-frequency spectral content being filtered out. Avoiding the loss of this information could be investigated in future work with spectral pre-whitening techniques.

The smaller reconstruction error for the spectrogram of task MEG data from DyNeMo is due to the linear mixture affording the model a greater flexibility to precisely model dynamics. The fact that the reconstruction error is only slightly reduced compared to the HMM suggests that despite the constraint of mutual exclusivity the HMM was still able to provide a good description of dynamics.

4.1. Methodological advancements

We believe that DyNeMo improves upon alternative unsupervised techniques in four key ways: the use of amortised inference; the use of the reparameterization trick; the ability to model data as a linear mixture of modes (opposed to mutually exclusive states) and the ability to model long-range temporal dependencies in the data.

The amortised inference framework used in DyNeMo (described in Section 2) contains a fixed number of trainable parameters (inference RNN weights and biases). This means DyNeMo is readily trainable on datasets of varying size. Usually, the number of trainable parameters in the inference network is significantly smaller than the size of a dataset, making this approach very efficient when scaling to bigger datasets. As the availability of larger datasets grows, so does the need for models that can utilise them. Here, we believe deep learning techniques will play an important role, where with more data, models with a deep architecture begin to outperform shallower ones. Although, in this work we have studied a relatively small dataset (51–55 subjects) using a shallow model (one RNN layer), DyNeMo is readily scalable in terms of model complexity to include multiple RNN layers and more hidden units. In combination with bigger datasets this can reveal new insights into brain data. For example, previous modelling of a large resting-state fMRI dataset (Human Connectome Project, Smith et al., 2013) using an HMM revealed a link between FC dynamics and heritable and psychological traits (Vidaurre et al., 2017). The training time for DyNeMo and the computational expense of the analysis presented in this work is comparable to the HMM training time and analysis performed with the HMM-MAR toolbox5 presented in Vidaurre et al. (2018). We believe due to the use of amortised inference, DyNeMo will be a more efficient option for larger datasets compared to the HMM-MAR toolbox.

Provided we are able to apply the reparameterization trick to sample from the variational posterior distribution, we are able to infer the parameters for any generative model. This facilitates the use of more so-phisticated and non-linear observation models and opens up a range of future modelling opportunities. This includes the use of an autoregressive model capable of learning temporal correlations in the observed data; the hierarchical modelling of inter-subject variability and the inclusion of dynamics at multiple time scales, similar to the approach used in Pervaiz et al. (2022).

A key modelling advancement afforded by DyNeMo is the ability to model data as a time-varying linear sum of modes. The extent to which modes mix is controlled by a free parameter referred to as the temperature, τ, which appears in the softmax transformation of the logits (see Equation (23) in SI 1.2). Low temperatures lead to mutually exclusive modes whereas high temperatures lead to a soft mixture of modes. In this work, we allow the temperature to be a trainable parameter. By doing this, the output of the softmax transformation is able to be tuned during training to find the appropriate level of mixing to best describe the data. Such a scheme can be interpreted as form of entropy regularisation (Jang, Gu, Poole; Pereyra et al., 2017).

The inclusion of a model RNN in DyNeMo allows it to generate data with long-range temporal dependencies (Figs. 4 and 10). This is because the future value of a hidden logit is determined by a long sequence of previous values, not just the most recent value. There is significant evidence for long-range temporal correlations in M/EEG data (Botcharova et al., 2015; He, 2014; Linkenkaer-Hansen et al., 2001) and an association between altered long-range temporal correlations and disease (Cruz et al., 2021; Moran et al., 2019). Models that are capable of learning long-range temporal correlations are advantageous in multiple ways: they can be more predictive of task or disease than models with a shorter memory; they can prevent overfitting to noise in the training data through regularisation and finally they can be used to synthesise data with realistic long-range neural dynamics.

In addition to the modelling and inference advancements discussed above, we also proposed a new method for calculating spectral properties for data described using a set of modes (see Section 2.4). With an HMM, methods such as a multitaper (Vidaurre et al., 2016) can be used to provide high-resolution estimates of PSDs and coherences for each state. This approach relies on the state time course identifying segments of the training data where only one state is active. This approach is no longer feasible with a description of the data as a set of co-existing modes. In this paper, we propose fitting a linear regression model to a cross spectrogram calculated using the data. This method relies on different time points having different ratios of mixing between the modes. Provided this is the case, this method produces high-resolution estimates of the PSD and coherence of each mode (Figs. 6, 12 and 14).

4.2. Drawbacks

As with most modern machine learning models, DyNeMo contains a large number of hyperparameters that need to be specified before the model can be trained. These are discussed in SI 1.2. An important hyperparameter that affects the interpretation of inferences from the model is the number of modes, J. We discuss the impact of varying the number of modes in SI 1.5. In short, as the number of modes is increased, the spatial activity of each mode becomes more localised and the variability of the inferred spatial patterns increases. The variational free energy is an approximation to the model evidence (Friston et al., 2007) so can be used to compare models with a different number of modes. However, Figure S4 shows the variational free energy decreases monotonically up to 30 modes. This implies more modes provide a better model for the data. As we increase the number of modes we lose the low-dimensional interpretable description of the data. Because of this trade-off we specify the number of modes by hand rather than using the variational free energy. Additionally, we ensure any conclusions that are based on studies using DyNeMo are not sensitive to the number of modes chosen. We tune other hyperparameters by seeking the set of parameters that minimise the value of the loss function.

In addition to a large number of hyperparameters, we find the model is sensitive to the initialisation of trainable parameters. This includes the internal weights and biases of RNN layers and the learnable free parameters for the mode means and covariances. The initialisations used in this work are listed in SI 1.2. We found the initialisation of the mode covariances to be particularly important. We overcome the issue of sensitivity to the initialisation of trainable parameters by training the model from scratch with different initialisations and only retaining the model with the lowest loss.

4.3. Outlook and future applications

The model presented here has many possible future applications. For example, it could be used to provide a dynamic and interpretable latent description, as done in this work, for other datasets. Alternatively, it could be used to facilitate future studies, examples of which are described below.

A common method to study the brain is the use of temporally un-constrained multivariate pattern analysis (decoding) to predict task, disease or behavioural traits (Vidaurre et al., 2019). The latent representation inferred by DyNeMo (unsupervised) provides a low-dimensional form of the training data, which is ideal for such analyses. This can overcome overfitting issues that are commonly encountered in decoding studies that use the raw data directly. Alternatively, the model architecture could be easily modified to form a semi-supervised learning problem where the loss function used has a joint objective to learn a low-dimensional representation that is useful for decoding as well as reconstructing the training data.

A useful feature of DyNeMo is the possibility of transfer learning, i.e. the ability to transfer information learnt from one dataset to another. This could be exercised by simply training DyNeMo on one dataset from scratch, before fine tuning the model on another dataset, which would facilitate the transfer of information through all the trainable parameters of the model, such as RNN weights, mode means/covariances, etc. Large resting-state datasets are commonplace in neuroimaging. A problem encountered in studies of small datasets (e.g. comprising of diseased cohorts) is the lack of statistical power for drawing meaningful conclusions (Poldrack et al., 2017). Leveraging information gained from larger resting-state datasets could improve the predictions made on smaller datasets. For example, it has been shown resting-state data is predictive of task response (Becker et al., 2020; Tavor et al., 2016). We believe DyNeMo offers the possibility of transferring information acquired from resting-state datasets with thousands of individuals to the individual subject level.

The generative model proposed here explictly models the covariance of the training data as a dynamic quantity. In this paper, we trained on prepared (time-delay embedded/PCA) source reconstructed data. However, the model could be trained on unprepared sensor-level data to estimate the sensor covariance as a function of time. Such a model could be utilised in the field of M/EEG source reconstruction. Algorithms for source reconstruction often assume the sensor-level covariance is static, which is rarely the case (Gómez et al., 2021). Using a dynamic estimate of the covariance, we can construct time-varying reconstruction weights for source reconstruction (Woolrich et al., 2013), which can improve source localisation.

Finally, whilst we focused on parcellated source reconstructed MEG data in this paper, DyNeMo could of course be applied to data from other neuroimaging modalities such as fMRI, sensor level MEG data and other electrophysiological techniques (EEG, ECOG, etc.).

Conclusions

We have proposed a new generative model and accompanying inference framework for neuroimaging data that is readily scalable to large datasets. Our application of DyNeMo to MEG data reveals fast transient networks that are spectrally distinct, in broad agreement with existing studies. We believe DyNeMo can be used to help us better understand the brain by providing an accurate model for brain data that explicitly models its dynamic nature using a linear mixture of modes. The modest improvement in modelling dynamic spectral properties compared to an HMM shows the assumption of mutual exclusivity does not necessarily impact the HMM’s ability to model the data effectively. Nevertheless, DyNeMo is a novel and complementary tool that is useful for studying neuroimaging data.

Supplementary Material

Supplementary material associated with this article can be found, in the online version, at doi: 10.1016/j.neuroimage.2022.119595.

Supplementary Data S1

Acknowledgments

We would like to thank Matt Brookes and his team at the University of Nottingham for providing us with the data analysed here. Data were collected in the context of the Medical Research Council (MRC)-funded MEG UK partnership. We would also like to thank Yan-Ping Zhang-Schaerer for her feedback in developing DyNeMo.

This research was supported by the National Institute for Health Research (NIHR) Oxford Health Biomedical Research Centre. The Wellcome Centre for Integrative Neuroimaging is supported by core funding from the Wellcome Trust (203139/Z/16/Z). C.G. is supported by the Wellcome Trust (215573/Z/19/Z). E.R. is supported by an Engineering and Physical Sciences Research Council (EPSRC) and MRC grant (EP/L016044/1) and F. Hoffmann-La Roche. R.T. is supported by the EPSRC and MRC (EP/L016052/1). C.H. is supported by the Wellcome Trust (215573/Z/19/Z). A.Q. is supported the MRC (RG94383/RG89702) and by the NIHR Oxford Health Biomedical Research Centre. U.P. is supported by an MRC Mental Health Data Pathfinder award (MC/PC/17215). J.vA. is supported by EPSRC (EP/N509711/1) and Google DeepMind. P.N. is supported by EPSRC Industrial Cooperative Awards in Science & Technology (18000077) and GlaxoSmithKline. Y.G. holds a Turing AI Fellowship (Phase 1) at the Alan Turing Institute, which is supported by the EPSRC (V030302/1). M.W. is supported by NIHR Oxford Health Biomedical Research Centre, the Wellcome Trust (106183/Z/14/Z and 215573/Z/19/Z), and the New Therapeutics in Alzheimer’s Diseases (NTAD) study supported by the MRC and the Dementia Platform UK.

Footnotes

Ethics Statement

All participants gave written informed consent and ethical approval was granted by the University of Nottingham Medical School Research Ethics Committee.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Credit authorship contribution statement

Chetan Gohil: Methodology, Software, Validation, Formal analysis, Data curation, Writing – original draft, Writing – review & editing, Visualization. Evan Roberts: Methodology, Software, Validation, Formal analysis, Data curation, Writing – original draft, Visualization. Ryan Timms: Methodology, Software, Writing – original draft. Alex Skates: Methodology, Software. Cameron Higgins: Formal analysis. Andrew Quinn: Formal analysis. Usama Pervaiz: Methodology. Joost van Amersfoort: Methodology. Pascal Notin: Methodology. Yarin Gal: Supervision. Stanislaw Adaszewski: Supervision. Mark Wool-rich: Methodology, Software, Data curation, Supervision.

2

Including the positivity constraint enables us to interpret the αt values as mixing coefficients and the sum to one constraint ensures the distribution of mixing coefficients is sufficiently non-Gaussian for the model to be identifiable (Eriksson and Koivunen, 2004).

3

Note, standardisation was also performed before PCA.

4

We only use the maximum a posteriori probability estimate post-hoc, during training we sample from the variational posterior distribution using the reparameterization trick.

Data and Code Availability Statement

The MEG dataset used in this work was acquired at the University of Nottingham in the context of the MEG UK Partnership. Access to the MEG UK database can be requested at http://meguk.ac.uk/contact. An implementation of DyNeMo written in Python can be accessed here: https://github.com/OHBA-analysis/osl-dynamics. Version v1.0.0 of osl-dynamics was used in this work with Python 3.8 and TensorFlow 2.4.

References

  1. Allen E, Damaraju E, Plis S, Erhardt E, Eichele T, Calhoun V. Tracking whole-brain connectivity dynamics in the resting state. Cereb Cortex. 2014;24:663–676. doi: 10.1093/cercor/bhs352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Atasoy S, Donnelly I, Pearson J. Human brain networks function in connectome-specific harmonic waves. Nat Commun. 2016;7:1–10. doi: 10.1038/ncomms10340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Baker A, Brookes M, Rezek I, Smith S, Behrens T, Smith P, Woolrich M. Fast transient networks in spontaneous human brain activity. Elife. 2014;3:E01867. doi: 10.7554/eLife.01867. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Becker R, Vidaurre D, Quinn A, Abeysuriya R, Jones O, Jbabdi S, Woolrich M. Transient spectral events in resting state MEG predict individual task responses. Neuroimage. 2020;215:116818. doi: 10.1016/j.neuroimage.2020.116818. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Beckmann C, DeLuca M, Devlin J, Smith S. Investigations into resting-state connectivity using independent component analysis. Philos Trans R Soc BBiol Sci. 2005;360:1001–1013. doi: 10.1098/rstb.2005.1634. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Betti V, Della Penna S, De Pasquale F, Mantini D, Marzetti L, Romani G, Corbetta M. Natural scenes viewing alters the dynamics of functional connectivity in the human brain. Neuron. 2013;79:782–797. doi: 10.1016/j.neuron.2013.06.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bishop C. Pattern Recognition and Machine Learning. Springer; 2007. [Google Scholar]
  8. Botcharova M, Berthouze L, Brookes M, Barnes G, Farmer S. Resting state MEG oscillations show long-range temporal correlations of phase synchrony that break down during finger movement. Front Physiol. 2015;6:183. doi: 10.3389/fphys.2015.00183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Bowman S, Vilnis L, Vinyals O, Dai A, Jozefowicz R, Bengio S. Generating sentences from a continuous space. 2015 [Google Scholar]
  10. Brookes M, Hale J, Zumer J, Stevenson C, Francis S, Barnes G, Owen J, Morris P, Nagarajan S. Measuring functional connectivity using MEG: methodology and comparison with fcMRI. Neuroimage. 2011;56:1082–1104. doi: 10.1016/j.neuroimage.2011.02.054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Brookes M, O’Neill G, Hall E, Woolrich M, Baker A, Corner S, Robson S, Morris P, Barnes G. Measuring temporal, spectral and spatial changes in electrophysiological brain network connectivity. Neuroimage. 2014;91:282–299. doi: 10.1016/j.neuroimage.2013.12.066. [DOI] [PubMed] [Google Scholar]
  12. Brookes M, Woolrich M, Luckhoo H, Price D, Hale J, Stephenson M, Barnes G, Smith S, Morris P. Investigating the electrophysiological basis of resting state networks using magnetoencephalography. Proc Natl Acad Sci. 2011;108:16783–16788. doi: 10.1073/pnas.1112685108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Brovelli A, Badier J, Bonini F, Bartolomei F, Coulon O, Auzias G. Dynamic reconfiguration of visuomotor-related functional connectivity networks. J Neurosci. 2017;37:839–853. doi: 10.1523/JNEUROSCI.1672-16.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Capilla A, Arana L, García-Huéscar M, Melcón M, Gross J, Campo P. The natural frequencies of the resting human brain: an MEG-based atlas. Neuroimage. 2022;258:119373. doi: 10.1016/j.neuroimage.2022.119373. [DOI] [PubMed] [Google Scholar]
  15. Carbo E, Hillebrand A, Van Dellen E, Tewarie P, Witt Hamer P, Baayen J, Klein M, Geurts J, Reijneveld J, Stam C. Others dynamic hub load predicts cognitive decline after resective neurosurgery. Sci Rep. 2017;7:1–10. doi: 10.1038/srep42117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Chang C, Liu Z, Chen M, Liu X, Duyn J. EEG correlates of time-varying BOLD functional connectivity. Neuroimage. 2013;72:227–236. doi: 10.1016/j.neuroimage.2013.01.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Colclough G, Brookes M, Smith S, Woolrich M. A symmetric multivariate leakage correction for MEG connectomes. Neuroimage. 2015;117:439–448. doi: 10.1016/j.neuroimage.2015.03.071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Cole M, Bassett D, Power J, Braver T, Petersen S. Intrinsic and task-evoked network architectures of the human brain. Neuron. 2014;83:238–251. doi: 10.1016/j.neuron.2014.05.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Cruz G, Grent T, Krishnadas R, Palva M, Palva S, Uhlhaas P. Long range temporal correlations (LRTCs) in MEG-data during emerging psychosis: relationship to symptoms, medication-status and clinical trajectory. Neuroimage. 2021:102722. doi: 10.1016/j.nicl.2021.102722. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. De Pasquale F, Della Penna S, Snyder A, Lewis C, Mantini D, Marzetti L, Belardinelli P, Ciancetta L, Pizzella V, Romani G. Temporal dynamics of spontaneous MEG activity in brain networks. Proc Natl Acad Sci. 2010;107:6040–6045. doi: 10.1073/pnas.0913863107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. De Pasquale F, Della Penna S, Snyder A, Marzetti L, Pizzella V, Romani G, Corbetta M. A cortical core for dynamic integration of functional networks in the resting human brain. Neuron. 2012;74:753–764. doi: 10.1016/j.neuron.2012.03.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. De Pasquale F, Della Penna S, Sporns O, Romani G, Corbetta M. A dynamic core network and global efficiency in the resting human brain. Cereb Cortex. 2016;26:4015–4033. doi: 10.1093/cercor/bhv185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Elton A, Gao W. Task-related modulation of functional connectivity variability and its behavioral correlations. Hum Brain Mapp. 2015;36:3260–3272. doi: 10.1002/hbm.22847. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Engel A, Gerloff C, Hilgetag C, Nolte G. Intrinsic coupling modes: multiscale interactions in ongoing brain activity. Neuron. 2013;80:867–886. doi: 10.1016/j.neuron.2013.09.038. [DOI] [PubMed] [Google Scholar]
  25. Eriksson J, Koivunen V. Identifiability, separability, and uniqueness of linear ICA models. IEEE Signal Process Lett. 2004;11:601–604. [Google Scholar]
  26. Fries P. Rhythms for cognition: communication through coherence. Neuron. 2015;88:220–235. doi: 10.1016/j.neuron.2015.09.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Friston K. Functional and effective connectivity in neuroimaging: a synthesis. Hum Brain Mapp. 1994;2:56–78. [Google Scholar]
  28. Friston K, Kilner J, Harrison L. A free energy principle for the brain. J Physiol-Paris. 2006;100:70–87. doi: 10.1016/j.jphysparis.2006.10.001. [DOI] [PubMed] [Google Scholar]
  29. Friston K, Mattout J, Trujillo-Barreto N, Ashburner J, Penny W. Variational free energy and the laplace approximation. Neuroimage. 2007;34:220–234. doi: 10.1016/j.neuroimage.2006.08.035. [DOI] [PubMed] [Google Scholar]
  30. Géron A. Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media, Inc; 2019. Hands-on Machine Learning with Scikit-learn. [Google Scholar]
  31. Glomb K, Queralt J, Pascucci D, Defferrard M, Tourbier S, Carboni M, Rubega M, Vulliemoz S, Plomp G, Hagmann P. Connectome spectral analysis to track EEG task dynamics on a subsecond scale. Neuroimage. 2020;221:117137. doi: 10.1016/j.neuroimage.2020.117137. [DOI] [PubMed] [Google Scholar]
  32. Gómez G, Peigneux P, Wens V, Bourguignon M. Localization accuracy of a common beamformer for the comparison of two conditions. Neuroimage. 2021;230:117793. doi: 10.1016/j.neuroimage.2021.117793. [DOI] [PubMed] [Google Scholar]
  33. Gschwind M, Michel C, De Ville DV. Long-range dependencies make the difference - comment on “A Stochastic model for EEG microstate sequence analysis”. Neuroimage. 2015;117:449–455. doi: 10.1016/j.neuroimage.2015.05.062. [DOI] [PubMed] [Google Scholar]
  34. He B. Scale-free brain activity: past, present, and future. Trends Cogn Sci. 2014;18:480–487. doi: 10.1016/j.tics.2014.04.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Higgins C, Liu Y, Vidaurre D, Kurth-Nelson Z, Dolan R, Behrens T, Woolrich M. Replay bursts in humans coincide with activation of the default mode and parietal alpha networks. Neuron. 2021;109:882–893. doi: 10.1016/j.neuron.2020.12.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Hipp J, Hawellek D, Corbetta M, Siegel M, Engel A. Large-scale cortical correlation structure of spontaneous oscillatory activity. Nat Neurosci. 2012;15:884–890. doi: 10.1038/nn.3101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Huang M, Mosher J, Leahy R. A sensor-weighted overlapping-sphere head model and exhaustive head model comparison for MEG. Phys Med Biol. 1999;44:423. doi: 10.1088/0031-9155/44/2/010. [DOI] [PubMed] [Google Scholar]
  38. Hunt B, Liddle E, Gascoyne L, Magazzini L, Routley B, Singh K, Morris P, Brookes M, Liddle P. Attenuated post-movement beta rebound associated with schizotypal features in healthy people. Schizophr Bull. 2019;45:883–891. doi: 10.1093/schbul/sby117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Hutchison R, Womelsdorf T, Allen E, Bandettini P, Calhoun V, Corbetta M, Della Penna S, Duyn J, Glover G, Gonzalez-Castillo J. Dynamic functional connectivity: promise, issues, and interpretations. Neuroimage. 2013;80:360–378. doi: 10.1016/j.neuroimage.2013.05.079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Jang E, Gu S, Poole B. Categorical reparameterization with gumbel-softmax. arXiv preprint. 2016:arXiv:1611.01144 [Google Scholar]
  41. Jenkinson M, Pechaud M, Smith S. BET2: MR-based estimation of brain, skull and scalp surfaces; Eleventh Annual Meeting of The Organization For Human Brain Mapping; 2005. p. 167. [Google Scholar]
  42. Jurkiewicz M, Gaetz W, Bostan A, Cheyne D. Post-movement beta rebound is generated in motor cortex: evidence from neuromagnetic recordings. Neuroimage. 2006;32:1281–1289. doi: 10.1016/j.neuroimage.2006.06.005. [DOI] [PubMed] [Google Scholar]
  43. Kingma D, Welling M. Auto-encoding variational bayes; International Conference on Learning Representations (ICLR); 2014. [Google Scholar]
  44. Kucyi A, Davis K. Dynamic functional connectivity of the default mode network tracks daydreaming. Neuroimage. 2014;100:471–480. doi: 10.1016/j.neuroimage.2014.06.044. [DOI] [PubMed] [Google Scholar]
  45. Laird A, Fox P, Eickhoff S, Turner J, Ray K, McKay D, Glahn D, Beckmann C, Smith S, Fox P. Behavioral interpretations of intrinsic connectivity networks. J Cogn Neurosci. 2011;23:4022–4037. doi: 10.1162/jocn_a_00077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Liégeois R, Ziegler E, Phillips C, Geurts P, Gomez F, Bahri M, Yeo B, Soddu A, Vanhaudenhuyse A, Laureys S. Cerebral functional connectivity periodically (de) synchronizes with anatomical constraints. Brain Struct Funct. 2016;221:2985–2997. doi: 10.1007/s00429-015-1083-y. [DOI] [PubMed] [Google Scholar]
  47. Lindquist M, Xu Y, Nebel M, Caffo B. Evaluating dynamic bivariate correlations in resting-state fMRI: a comparison study and a new approach. Neuroimage. 2014;101:531–546. doi: 10.1016/j.neuroimage.2014.06.052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Linkenkaer-Hansen K, Nikouline V, Palva J, Ilmoniemi R. Long-range temporal correlations and scaling behavior in human brain oscillations. J Neurosci. 2001;21:1370–1377. doi: 10.1523/JNEUROSCI.21-04-01370.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Luckhoo H, Hale J, Stokes M, Nobre A, Morris P, Brookes M, Woolrich M. Inferring task-related networks using independent component analysis in magnetoencephalography. Neuroimage. 2012;62:530–541. doi: 10.1016/j.neuroimage.2012.04.046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Meisel C, Bailey K, Achermann P, Plenz D. Decline of long-range temporal correlations in the human brain during sustained wakefulness. Sci Rep. 2017;7:1–11. doi: 10.1038/s41598-017-12140-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Michel C, Koenig T. EEG microstates as a tool for studying the temporal dynamics of whole-brain neuronal networks: a review. Neuroimage. 2018;180:577–593. doi: 10.1016/j.neuroimage.2017.11.062. [DOI] [PubMed] [Google Scholar]
  52. Moran J, Michail G, Heinz A, Keil J, Senkowski D. Long-range temporal correlations in resting state beta oscillations are reduced in schizophrenia. Front Psychiatry. 2019;10:517. doi: 10.3389/fpsyt.2019.00517. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Núñez P, Poza J, Gómez C, Rodriguez-Gonzalez V, Hillebrand A, Tewarie P, Tola-Arribas M, Cano M, Hornero R. Abnormal meta-state activation of dynamic brain networks across the alzheimer spectrum. Neuroimage. 2021;232:117898. doi: 10.1016/j.neuroimage.2021.117898. [DOI] [PubMed] [Google Scholar]
  54. O’Neill G, Bauer M, Woolrich M, Morris P, Barnes G, Brookes M. Dynamic recruitment of resting state sub-networks. Neuroimage. 2015;115:85–95. doi: 10.1016/j.neuroimage.2015.04.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. O’Neill G, Tewarie P, Colclough G, Gascoyne L, Hunt B, Morris P, Woolrich M, Brookes M. Measurement of dynamic task related functional networks using MEG. Neuroimage. 2017;146:667–678. doi: 10.1016/j.neuroimage.2016.08.061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. OSL. https://github.com/OHBA-analysis/osl-core .
  57. Palva J, Wang S, Palva S, Zhigalov A, Monto S, Brookes M, Schoffelen J, Jerbi K. Ghost interactions in MEG/EEG source space: anote of caution on inter-areal coupling measures. Neuroimage. 2018;173:632–643. doi: 10.1016/j.neuroimage.2018.02.032. [DOI] [PubMed] [Google Scholar]
  58. Papoulis A, Saunders H. Probability, random variables and stochastic processes. 1989 [Google Scholar]
  59. Pervaiz U, Vidaurre D, Gohil C, Smith S, Woolrich M. Multi-dynamic modelling reveals strongly time-varying resting fMRI correlations. Med Image Anal. 2022;77:102366. doi: 10.1016/j.media.2022.102366. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Poldrack R, Baker C, Durnez J, Gorgolewski K, Matthews P, Munafo M, Nichols T, Poline J, Vul E, Yarkoni T. Scanning the horizon: towards transparent and reproducible neuroimaging research. Nat Rev Neurosci. 2017;18:115–126. doi: 10.1038/nrn.2016.167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Ponce-Alvarez A, Deco G, Hagmann P, Romani G, Mantini D, Corbetta M. Resting-state temporal synchronization networks emerge from connectivity topology and heterogeneity. PLoS Comput Biol. 2015;11:E1004100. doi: 10.1371/journal.pcbi.1004100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Preti MG, Bolton TA, Van De Ville D. The dynamic functional connectome: state-of-the-art and perspectives. Neuroimage. 2017;160:41–54. doi: 10.1016/j.neuroimage.2016.12.061. [DOI] [PubMed] [Google Scholar]
  63. Quinn A, Vidaurre D, Abeysuriya R, Becker R, Nobre A, Woolrich M. Task-evoked dynamic network analysis through hidden Markov modeling. Front Neurosci. 2018;12:603. doi: 10.3389/fnins.2018.00603. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Rabiner L, Juang B. An introduction to hidden Markov models. IEEE ASSP Mag. 1986;3:4–16. [Google Scholar]
  65. Rue-Quéralt J, Glomb K, Pascucci D, Tourbier S, Carboni M, Vulliémoz S, Plomp G, Hagmann P. The connectome spectrum as a canonical basis for a sparse representation of fast brain activity. Neuroimage. 2021;244:118611. doi: 10.1016/j.neuroimage.2021.118611. [DOI] [PubMed] [Google Scholar]
  66. Sakoglu U, Pearlson G, Kiehl K, Wang Y, Michael A, Calhoun V. A method for evaluating dynamic functional network connectivity and task-modulation: application to schizophrenia. Magn Reson Mater Phys Biol Med. 2010;23:351–366. doi: 10.1007/s10334-010-0197-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Salmelin R, Hámáaláinen M, Kajola M, Hari R. Functional segregation of movement-related rhythmic activity in the human brain. Neuroimage. 1995;2:237–243. doi: 10.1006/nimg.1995.1031. [DOI] [PubMed] [Google Scholar]
  68. Seedat Z, Quinn A, Vidaurre D, Liuzzi L, Gascoyne L, Hunt B, O’Neill G, Pakenham D, Mullinger K, Morris P. The role of transient spectral ‘bursts’ in functional connectivity: a magnetoencephalography study. Neuroimage. 2020;209:116537. doi: 10.1016/j.neuroimage.2020.116537. [DOI] [PubMed] [Google Scholar]
  69. Smith S. Fast robust automated brain extraction. Hum Brain Mapp. 2002;17:143–155. doi: 10.1002/hbm.10062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Smith S, Beckmann C, Andersson J, Auerbach E, Bijsterbosch J, Douaud G, Duff E, Feinberg D, Griffanti L, Harms M. Resting-state fMRI in the human connectome project. Neuroimage. 2013;80:144–168. doi: 10.1016/j.neuroimage.2013.05.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Smith S, Fox P, Miller K, Glahn D, Fox P, Mackay C, Filippini N, Watkins K, Toro R, Laird A. Correspondence of the brain’s functional architecture during activation and rest. Proc Natl Acad Sci. 2009;106:13040–13045. doi: 10.1073/pnas.0905267106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Sporns O, Faskowitz J, Teixeira A, Cutts S, Betzel R. Dynamic expression of brain functional systems disclosed by fine-scale analysis of edge time series. Netw Neurosci. 2021;5:405–433. doi: 10.1162/netn_a_00182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Stam C, De Haan W, Daffertshofer A, Jones B, Manshanden I, Walsum A, Montez T, Verbunt J, De Munck J, Van Dijk B. Graph theoretical analysis of magnetoencephalographic functional connectivity in Alzheimer’s disease. Brain. 2009;132:213–224. doi: 10.1093/brain/awn262. [DOI] [PubMed] [Google Scholar]
  74. Stevens M. The contributions of resting state and task-based functional connectivity studies to our understanding of adolescent brain network maturation. Neurosci Biobehav Rev. 2016;70:13–32. doi: 10.1016/j.neubiorev.2016.07.027. [DOI] [PubMed] [Google Scholar]
  75. Stoffers D, Bosboom J, Wolters E, Stam C, Berendse H. Dopaminergic modulation of cortico-cortical functional connectivity in Parkinson’s disease: an MEG study. Exp Neurol. 2008;213:191–195. doi: 10.1016/j.expneurol.2008.05.021. [DOI] [PubMed] [Google Scholar]
  76. Tagliazucchi E, Von Wegner F, Morzelewski A, Brodbeck V, Laufs H. Dynamic BOLD functional connectivity in humans and its electrophysiological correlates. Front Hum Neurosci. 2012;6:339. doi: 10.3389/fnhum.2012.00339. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Tavor I, Jones O, Mars R, Smith S, Behrens T, Jbabdi S. Task-free MRI predicts individual differences in brain activity during task performance. Science. 2016;352:216–220. doi: 10.1126/science.aad8127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Pereyra G, Tucker G, Chorowski J, Kaiser L, Hinton G. Regularizing neural networks by penalizing confident output distributions. arXiv preprint. 2017:arXiv:1701.06548 [Google Scholar]
  79. Tewarie P, Liuzzi L, O’Neill G, Quinn A, Griffa A, Woolrich M, Stam C, Hillebrand A, Brookes M. Tracking dynamic brain networks using high temporal resolution MEG measures of functional connectivity. Neuroimage. 2019;200:38–50. doi: 10.1016/j.neuroimage.2019.06.006. [DOI] [PubMed] [Google Scholar]
  80. Trujillo-Barreto N, Araya D, El-Deredy W. The discrete logic of the brain-explicit modelling of brain state durations in EEG and MEG. BioRxiv. 2019:635300 [Google Scholar]
  81. Van Veen B, Buckley K. Beamforming: a versatile approach to spatial filtering. IEEE ASSP Mag. 1988;5:4–24. [Google Scholar]
  82. Vidaurre D, Hunt L, Quinn A, Hunt B, Brookes M, Nobre A, Woolrich M. Spontaneous cortical activity transiently organises into frequency specific phase-coupling networks. Nat Commun. 2018;9:1–13. doi: 10.1038/s41467-018-05316-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Vidaurre D, Myers N, Stokes M, Nobre A, Woolrich M. Temporally unconstrained decoding reveals consistent but time-varying stages of stimulus processing. Cereb Cortex. 2019;29:863–874. doi: 10.1093/cercor/bhy290. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Vidaurre D, Quinn A, Baker A, Dupret D, Tejero-Cantero A, Woolrich M. Spectrally resolved fast transient brain states in electrophysiological data. Neuroimage. 2016;126:81–95. doi: 10.1016/j.neuroimage.2015.11.047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Vidaurre D, Smith S, Woolrich M. Brain network dynamics are hierarchically organized in time. Proc Natl Acad Sci. 2017;114:12827–12832. doi: 10.1073/pnas.1705120114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Woolrich M, Baker A, Luckhoo H, Mohseni H, Barnes G, Brookes M, Rezek I. Dynamic state allocation for MEG source reconstruction. Neuroimage. 2013;77:77–92. doi: 10.1016/j.neuroimage.2013.03.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Woolrich M, Jbabdi S, Patenaude B, Chappell M, Makni S, Behrens T, Beckmann C, Jenkinson M, Smith S. Bayesian analysis of neuroimaging data in FSL. Neuroimage. 2009;45:S173–S186. doi: 10.1016/j.neuroimage.2008.10.055. [DOI] [PubMed] [Google Scholar]
  88. Yang Z, LaConte S, Weng X, Hu X. Ranking and averaging independent component analysis by reproducibility (RAICAR) Hum Brain Mapp. 2008;29:711–725. doi: 10.1002/hbm.20432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Yu S. Hidden semi-Markov models. Artif Intell. 2010;174:215–243. [Google Scholar]
  90. Zhang C, Bütepage J, Kjellström H, Mandt S. Advances in variational inference. IEEE Trans Pattern Anal Mach Intell. 2018;41:2008–2026. doi: 10.1109/TPAMI.2018.2889774. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data S1

Data Availability Statement

The MEG dataset used in this work was acquired at the University of Nottingham in the context of the MEG UK Partnership. Access to the MEG UK database can be requested at http://meguk.ac.uk/contact. An implementation of DyNeMo written in Python can be accessed here: https://github.com/OHBA-analysis/osl-dynamics. Version v1.0.0 of osl-dynamics was used in this work with Python 3.8 and TensorFlow 2.4.

RESOURCES