Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Nov 20.
Published in final edited form as: IEEE Trans Med Imaging. 2025 Nov;44(11):4627–4638. doi: 10.1109/TMI.2025.3580611

Understanding Brain Functional Dynamics Through Neural Koopman Operator with Control Mechanism

Zhixuan Zhou 1,2, Tingting Dan 3, Guorong Wu 4,5
PMCID: PMC12628071  NIHMSID: NIHMS2121522  PMID: 40526556

Abstract

One of the fundamental scientific problems in neuroscience is to have a good understanding of how cognition and behavior emerge from brain function. Since the neuroscience concept of cognitive control parallels the notion of system control in engineering, many computational models formulate the dynamics neural process into a dynamical system, where the hidden states of the complex neural system are modulated by energetic simulations. However, the human brain is a quintessential complex biological system. Current computation models either use neural networks to approximate the underlying dynamics, which makes it difficult to fully understand the system mechanics, or compromise to simplified linear models with very limited power to characterize non-linear and self-organized dynamics along with complex neural activities. To address this challenge, we devise an end-to-end deep model to identify the underlying brain dynamics based on Koopman operator theory, which allows us to model a complex non-linear system in an infinite-dimensional linear space. In the context of reverse engineering, we further propose a biology-inspired control module that adjusts the input (neural activity data) based on feedback to align brain dynamics with the underlying cognitive task. We have applied our deep model to predict cognitive states from a large scale of existing neuroimaging data by identifying the latent dynamic system of functional fluctuations. Promising results demonstrate the potential of establishing a system-level understanding of the intricate relationship between brain function and cognition through the landscape of explainable deep models.

Index Terms: Brain dynamics, fMRI analysis, Koopman operator, System identification, Variational inference

I. Introduction

Human brain is a complex system physically wired by massive bundles of white matter fibers [1]. On top of intertwined structural connectomes, ubiquitous neuronal oscillations emerge remarkable functional fluctuations, synchronizing across large-scale neural circuits, that support myriad high-level cognitive functions necessary for everyday living [2]. With the prevalence of functional neuroimaging technology in many neuroscience studies, the idea of understanding the mind forms an important concept that the dynamic nature of the human brain cannot be understood by thinking of the system as comprised of independent components [3]. In the broader scope of computational neuroscience, the overarching goal is to elucidate the system mechanism to the extent that the computational model can identify the governing equation of the underlying complex system. In this regard, there is a critical need to establish a system-level understanding of how the brain switches through different mental states and how the self-organized system behavior of functional fluctuations supports the cognitive states.

Tremendous efforts have been made to model the evolution of neural activity observations (such as fMRI) towards the accurate prediction of brain states associated with underlying cognitive tasks and find a mechanistic explanation for the brain’s underlying control in processing information as well as making decisions [4]. Since the neuroscience notion of neural process is analogous to the dynamical system used in engineering, many existing works model the brain functional activity as a dynamical system [5], [6], where the hidden states of the complex neural system are modulated by (external) energetic simulations. In [6], a linear system is used to model the temporal trajectory of BOLD (blood-oxygen-level-dependent) signal extracted from fMRI, where the transition of brain states is restricted to the wiring pattern of neuronal fiber bundles. Nevertheless, the human brain represents a fundamentally complex biological system. Current linear models lack sufficient capability to capture the non-linear and self-organized dynamics, as well as the intricate neural activities present in this complexity. Numerous deep neural networks have been proposed to infer the neuron population dynamics [7], [8] by projecting the neural activity observations to a latent low-dimensional embedding space in a layer-by-layer manner. However, the ‘black box’ nature of the learning mechanism renders current deep models lacking explainability to achieve a biologically meaningful understanding of functional dynamics.

To address these limitations, we present an explainable deep model BRICK (BRain dynamics Identification using Control and Koopman operator) by resolving the following three challenges.

  • Challenge 1: How to capture complex non-linear brain dynamics while maintaining mathematical tractability for systematic analysis? We tackle this challenge by employing Koopman operator theory [9], [10]. Koopman theory [9] has been used to discover the underlying dynamics of nonlinear complex systems through an infinite-dimensional linear operator acting on the space of all possible measurement functions of the system. Specifically, Koopman operator theory focuses on the functions of the states (a.k.a. “observables” or “measurements”) instead of the states themselves. It claimed that any sufficiently regular nonlinear autonomous dynamical system can be made linear under a high-dimensional non-linear blow-up of the state-space [11]. This guarantee allows us to design/learn a large number of “measurements” to fit the non-linear dynamics with linear form. Furthermore, we introduce a neural Koopman operator to linearize the complex nonlinear brain dynamics, which allows us to design an end-to-end deep model for large-scale functional neuroimages.

  • Challenge 2: How to find a series of measurements that can form a Koopman invariant subspace? In theory, Koopman operator seeks to express the full behavior of dynamical systems by spanning an infinite-dimensional space of measurement functions [12]. However, it is computationally impractical to find infinite measurement functions. In this regard, it is a common practice to approximate Koopman operator using a subset of measurement functions. For example, many existing methods use basis functions such as sin, cos, polynomials and radial basis function to model non-linear systems [13], [14]. Considering the self-organized behavior of function connectivity along with evolving cognitive states, we propose a permutation-equivariant spatiotemporal neural network that generates region-trackable measurements to approximate the Koopman invariant subspace. Specifically, we first calculate the region-to-region functional connectivity from the BOLD signals, where the wiring patterns form the backbone of a dynamical system supporting the functional fluctuations. The functional connectivity captures the temporal dependencies and is then processed by multiple permutation-equivariant neural networks, which leverage the spatial structure of brain regions while preserving anatomical interpretability in the brain, to generate a set of biologically meaningful measurements.

  • Challenge 3: How to design an interpretable control module to regulate the evolution of the dynamics? To guarantee that the learned measurements can form the Koopman invariant subspace, we propose a novel control module, which can generate a tasks-specific controller for unseen individual subjects based on their observations and cognitive states. With this controller, the system can correct the evolution of behavior according to the observations, reducing the accumulated error while offering neuroscience insight into cognitive control.

The outcome is an end-to-end deep model with great scalability to large-scale neuroimaging data, which allows us to advance our understanding of brain function through data-driven approaches. The major novelties and contributions are summarized as follows:

  • We integrate the Koopman operator and control into an end-to-end variational inference model, which allows us to uncover the working mechanism of dynamical system of brain function.

  • The latent brain dynamics identified by our method enable the modeling of the communication between regions and the discovery of how the particular brain regions regulate the evolution of self-organized functional fluctuations.

  • The comprehensive comparison experiments on classic physics systems and real world fMRI datasets demonstrate that the model achieves state-of-the-art performance. We also conduct extensive exploratory experiments.

II. Related Work

This section summarizes existing literature on modeling dynamical systems through data-driven methods, with a focus on extensions to neural processes of human brain.

Dynamic system identification in machine learning aims to capture the temporal behaviors of complex systems. One of the classic approaches for system identification is extended dynamic mode decomposition (EDMD) [15]. EDMD introduces a dictionary of nonlinear functions to map the system states into a higher-dimensional space. This mapping allows for a better approximation of complex, nonlinear dynamics by leveraging the Koopman operator theory. Another method called sparse identification of nonlinear dynamics (SINDy) [16] offers a novel approach to discovering governing equations from data. Unlike traditional methods, SINDy leverages sparse regression techniques to identify the most relevant terms from a library of candidate functions, effectively balancing model complexity with accuracy. While these methods work well on many systems, their performance heavily depends on the choice of dictionary functions, requiring significant domain expertise and the quality and quantity of measurement data. Furthermore, the requirement for clean, noise-free measurements is often difficult to meet in real-world applications.

Recently, a number of deep learning models have been proposed to identify the complex system dynamics. For example, piecewise-linear recurrent neural networks (PLRNN) [17] present a specialized state-space modeling approach for understanding neural dynamics from neuroimaging data. Unlike traditional neural networks, PLRNN combines the interpretability of linear dynamical systems with the expressiveness of nonlinear models through a piecewise-linear formulation. The dendritically expanded piecewise-linear recurrent neural network (dendPLRNN) [18] enhances the original PLRNN framework by incorporating principles from dendritic computation through linear spline basis expansion. This extension maintains the interpretability and mathematical tractability of PLRNNs while significantly improving their capacity to approximate complex nonlinear dynamics in lower-dimensional spaces. However, the training of these models relies on a traditional sequential scheme and scheduled sampling [19], which has been reported to be difficult to converge.

In the field of neuroscience, there are many parallel works that focus on identifying the latent dynamics of populations of neurons. Latent factor analysis via dynamical systems (LFADS) [8] addresses the challenge of understanding neural population dynamics from single-trial recordings. This deep learning approach aims to infer underlying neural dynamics from noisy, incomplete neural spiking data without requiring trial-to-trial averaging. LFADS employs a variational auto-encoder framework with a dynamical systems core to reconstruct latent neural trajectories, making it particularly powerful for analyzing motor cortical activity. Meanwhile, Poisson latent neural differential equations (PLNDE) [7] approach offers a novel window to identifying latent dynamics in neural populations by combining neural ordinary differential equations with Poisson observations. This framework specifically addresses the challenges of modeling spike train data through a low-dimensional nonlinear dynamical system. PLNDE stands out for its ability to accurately infer phase portraits and fixed points from neural data, making it particularly valuable for analyzing decision-making tasks and other cognitive processes. Nevertheless, the latent space representations of these methods, while powerful for prediction, may not always be easily interpretable in a neurobiological perspective. Furthermore, current state-of-the-art methods are limited to neuron populations of specific brain regions rather than the entire brain network dynamics.

Besides these methods, recently some works considered using Koopman operator theory to identify the brain dynamics in a linear space. For example, Gallos et.al. [20] proposed a multi-stage framework to identify the fMRI dynamics, whose application is concurrent to our work. it first utilized the diffusion map for dimension reduction. Then either feedforward neural networks with geometric harmonics or the Koopman operator framework is applied to the low-dimensional embedding and finally solves the pre-image problem. Another work [21] focused on ECoG data in a mismatch negativity experiment using a stand-alone Koopman operator approach based on kernel methods without control components. Deep-GraphDMD [22] employs an auto-encoder architecture to learn Koopman eigenfunctions for graph data, effectively embedding non-linear network dynamics into a latent linear space. However, these methods aim to obtain a common dynamics across different subjects, which is difficult due to the highly variable individual differences.

In addition to Koopman Operator theory, several promising neural operators were proposed for system identification. Multi-wavelet neural operator [23], [24] utilizes wavelet and orthogonal polynomials to efficiently represent the operator for single and coupled PDE. Pade Exponential Operators [25] was proposed to solve the initial value problem. Most of these methods focused on directly learn the complex non-linear dynamics of a system.

III. Method

In this section, we present our learning-based method BRICK for identifying and understanding the dynamical system of functional brain networks. In III-A, we first briefly introduce Koopman operator theory and utilize it to linearize the complex brain dynamics in a higher dimension measurement space. Then in III-B, we introduce how to generate measurements that leverage the spatio-temporal information of brain functional fluctuations. In III-C, we extend the proposed system to a closed-loop feedback control system with a novel control module. Finally, we describe the objective function and the corresponding optimization procedure in III-D. The variable notations we use are listed in Table I.

TABLE I:

Notations and Definitions

Notation Definition
x1:T The BOLD signal sequence with length T and N brain regions, regarded as the states and observations of the original brain system
g1:T The measurements functions with length T and dimension M generated from BOLD signals
N The number of brain regions (ROIs)
H The feature dimension of each node (ROI)
M The dimension/number of measurements. M=H×N
K Koopman operator with shape M×M
C Control matrix with shape M×M
A The functional connectivity with shape N×N
pθ(x) The unknown distribution of observations x1:T
pθg0 The prior distribution of initial conditions g0, assumed as a Gaussian
pθ(u) The prior distribution of control inputs u1:T, assumed as a Gaussian at each step
pθx1:Tu1:T,g0 The posterior distribution of observations
qψg0x1:T The posterior distribution of initial conditions g0
qψu1:Tx1:T The posterior distribution of control inputs u1:T

A. Linearizing Brain Dynamics with Koopman Operator

We formulate the brain network as a non-linear discrete-time dynamical system xt+1=fxt, where f:XX represents the state transition function mapping the state xt at time t to its subsequent state xt+1. The states x1:TN×T comprise the temporal evolution of BOLD signal snapshots across N brain regions. The fundamental principle underlying our methodology leverages Koopman operator theory to transform the non-linear brain state dynamics into a linear evolution in a higher-dimensional measurement space. Specifically, Koopman operator theory establishes that a sufficiently regular nonlinear dynamical system admits representation as an infinite-dimensional linear system [9], [10], [26]. Formally, for any measurement function g:X, the Koopman operator K is defined by:

Kg(x)=g(f(x)) (1)

However, finding such an infinite dimensional K is computationally impractical. Therefore, an alternative solution to find a Koopman invariant subspace G, s.t. gG, KgG. Then if the Koopman invariant subspace G is spanned by a finite number of basis functions g=g1,..,gM, we can use a restricted Koopman operator KM×M to approximate the original infinite dimensional Koopman operator K. Then we can get the approximated underlying brain network dynamics:

gt+1=Kgtx^t+1=Wxgt+1 (2)

where the subscript t means the time point and Wx transform the measurements to the original observations. Nevertheless, it is impossible to determine a finite dimensional Koopman invariant subspace that includes the original state variables for any system with multiple fixed points or any more general attractors [26]. Therefore, in the next two subsections, we aim to learn a set of measurement functions g as precise as possible (in III-B) and design an interpretable control module (in III-C) to reduce the accumulated error caused by the gap between true Koopman operator K and the approximated counterpart K.

B. Generating Initial Condition by Spatio-Temporal Measurements

In this section, we propose a method to generate basis measurement functions g while simultaneously inferring the initial conditions of the dynamical system.

1). Functional Connectivity:

Brunton et al. [27] demonstrated that a Koopman invariant subspace can be approximated using a finite set of time-delay measurements. This concept aligns with brain functional connectivity principles, defined as the temporal dependency of neuronal activation patterns across anatomically separated brain regions [28]. Therefore, we begin by calculating the functional connectivity of the given data. Specifically, given a series of BOLD signal snapshots x1:TN×T, where N is the number of brain regions and T denotes the length of the BOLD signal sequence (same as the states x1:T in III-A), we compute the Pearson correlation [29] between each pair of nodes. This yields the functional connectivity (FC) matrix AN×N, where each entry Aij is the Pearson correlation between node i and j. Subsequently, we apply a row-wise MLP to reduce the dimensionality from N to H, producing a temporal intermediate result Z0N×H, where HN.

2). Coupling with Spatio Information:

While the intermediate state Z0 captures functional information, it does not account for the brain network’s spatial information. Therefore, we employ a neural network to model inter-node interactions and obtain the final spatio-temporal measurements. It is crucial to maintain awareness of region-to-region communications in the underlying dynamic state (i.e., the measurement space). To achieve this, the spatio encoder network Φ (green box in Fig. 1) should be permutation-equivariant, meaning that σSN, ΦσZ0=σΦZ0, where SN is the permutation group. This spatial encoder Φ enables us to maintain correspondence between brain regions and their latent measurements. For fair comparison and ease of application, we adopt TransformerEncoder [30] as our encoder Φ, which provides a general permutation-equivariant architecture. Alternative models such as MPNN [31] or GCN [32], which specialize in graph structures, could also be employed. Finally, rather than directly using the output as the initial condition measurement, we consider the initial condition is drawn from a prior distribution g0pθg0, pθg0=N(0,ϵI), accounting for the stochastic nature and complexity of brain functional fluctuation dynamics. The approximate posterior distribution of g0 is qψ(g0|x1:T), a Gaussian with mean and covariance determined by two separate TransformerEncoder:

μg0=ΦμZ0 (3)
logΣg02=ΦΣZ0 (4)
Σg0=exp12logΣg02 (5)
g0Ng0μg0,Σg0N×H (6)
g0=vecg0M (7)

where vec() is the vectorization function and M=N×H is the number (or dimension) of measurements.

Fig. 1:

Fig. 1:

Overview of our deep model BRICK. In general, our model consists of three components. (a) Data processing. The input to our model includes the mean time course of BOLD signal x and functional connectivity matrix A. (b) System identification. The learning process for identifying the governing equation of the underlying functional fluctuations involves a neural Koopman operator for system linearization and a control module for system correction. These components work together collaboratively to uncover the system’s mechanism from the observed measurements. (c) Driving force of system identification. The objective function represents the overarching goal of accurately predicting future behavior and reconstructing the input based on the latent states.

C. Task-Related Control Module

In this section, we introduce a novel task-related control module, extending the proposed autonomous system to a closed-loop feedback control system, which allows the system receiving the observations and producing corresponding control signal to regulate the evolution of the dynamics. Specifically, we modify the system in Eq. 2 to the following form:

gt+1=Kgt+Cut+1 (8)

where CM×M is the control matrix and can be regarded as part of the system while u1:T is the external control inputs.

Following the autonomous dynamics captured by the Koopman operator K, which represents the shared, intrinsic brain dynamics across individuals, our BRICK framework addresses individual variability by proposing that the control matrix C should be individual-specific and task-related. While we parametrize a common Koopman operator K to maintain consistency of underlying neural dynamics across subjects, the personalization of the control module C reflects the fact that different individuals may require distinct control strategies based on task-related state, even when their fundamental neural dynamics follow similar patterns.

To generate the individual-specific control matrix C, we employ an encoder-based architecture that learns from functional observations. Specifically, we first encode the observations x1:T into a latent embedding E=Encoderx1:TM×T (shown at the top of Fig. 1). This embedding serves dual purposes: (1) it is used to generate the control matrix C=ψθ(e) and control input u1:T; (2) it simultaneously feeds into a classifier that predicts the task state, i.e., s^=Classifier(e), where e is obtained by applying an average pooling layer to E along the time dimension. This architecture ensures that the learned control matrices are grounded in functional activity patterns while maintaining task relevance, as verified by the classification objective. The Encoder we use here is a Transformer Encoder [30] followed by a MLP to adjust the dimension from N to M. Notably, we specifically use a simple linear layer as the Classifier. This design choice ensures that the task-relevant information is primarily encoded in the embedding space rather than being extracted by a sophisticated classifier. As for the control matrix generation, we simplify the control matrix to a diagonal matrix to prevent over-fitting and reduce the computational overhead. Together, we have diag(C)=tanh(We+b).

Next we use a variational inference scheme to generate feature-wise control input signal according to the observations. We assume the control inputs are govern by a Gaussian prior pθut=N(0,I), then we use a posterior qϕutx1:t to approximate this module by:

E=Transformerx1:Tμu1:T=MLPμ(E),logΣu1:T2=MLPΣ(E)Σu1:T=exp12logΣu1:T2qϕutx1:t=Nutμut,Σutqϕu1:Tx1:T=t=1Tqϕutx1:t (9)

D. Training and Optimization

So far, we have modeled the brain network dynamics with following components: (1) The initial condition g0; (2) The dynamical system gt+1=Kgt+Cut+1, and (3) the inferred control input u1:T. We now introduce the training scheme and optimization objective. During the training stage, the complete sequence of observations x1:T is available, allowing us to leverage the parallel processing capabilities of the Transformer architecture by computing the embeddings and control signals for all time steps simultaneously, rather than sequentially. Then, as for the system dynamics gt+1=Kgt+Cut+1, we adopt the parallel scan algorithm and the eigenvalue initialization method introduced in [11].

The objective function consists of several components: (1) the embedding classification loss Lcls=CrossEntropy(s,s^), where s and s^ are the true task-states and the one predicted by the embedding e respectively. (2) We want to maximize the log-likelihood of the given observations. For the compactness and readability, we replace x1:T, g1:T, u1:T with x, g, u

logpθ(x)LELBO=LlikelihoodLKLLlikelihood=Eqψ(ux),qψg0xt=1Tloglogpθxtut,g0LKL=DKLNg0μg0,Σg0pθg0+DKLqψ(ux)pθ(u) (10)

where Llikelihood is the log-likelihood of the reconstructed x^ in comparison with the observed data x under Gaussian noise with variance λ, while LKL represents the KL divergence penalties that regularize both the initial latent state g0 and the inferred inputs u to stay close to their respective prior distributions.

Taken together, we formulate the optimization objective as:

L=βLcls+(1β)LELBO (11)

where β is a balance hyperparameter.

IV. Experiments

To evaluate the power of our model BRICK for the system identification task, we fit our model on a synthetic dataset, which is a nonlinear dynamical system. We generated the trajectories of the system by (1) randomly selecting different initial values and system parameters. (2) randomly sampling trajectories with fixed lengths in the systems. Next, we conduct experiments on the real datasets HCP-A and ABIDE, where a detailed description will be provided in the corresponding section. We split the data into training, validation, and test sets with a ratio of 7:1:2. We use Grid Search on the training set and validation set to find the optimal hyperparameters combination and finally evaluate the models on the test set. The baselines include traditional data-driven system identification methods and deep learning methods. All experiments are conducted on a NVIDIA RTX 6000 Ada GPU with 48 GB memory. Here are the details:

  • EDMD [15]. The main idea of extended dynamics mode decomposition is to find a finite-dimensional linear map K whose spectral properties approximate those of the true Koopman operator K. This can be achieved by selecting a set of measurements and solving a corresponding least squares problem.

  • SINDy [16]. The sparse identification of nonlinear dynamics (SINDy) algorithm seeks to approximate the dynamics function dxdt=f(x) with a generalized linear model whose basis is polynomials or triangle functions. By solving a sparse session problem, only a few bases are active in the dynamical system.

  • Transformer [30] Transformer is a deep model good at learning long range dependencies benefit from the self-attention mechanism and has been successfully applied to dynamical system modeling [33].

  • PLRNN [17] Piecewise-linear recurrent neural network (PLRNN) is a nonlinear state space model RNN composed of rectified-linear units. PLRNN is one of the state-of-the-art model specialized for dynamics reconstruction and system identification.

  • dendPLRNN [18] Dendritic-PLRNN is an augmentation of PLRNN inspired by dendritic processing and is implemented based on spline basis expansion.

  • LRU [11] Linear Recurrent Unit (LRU) is a linear RNN which can efficiently handle long sequence data and has strong performance on sequence tasks.

  • DLinear [34] DLinear is a simple yet very effective baseline for time series forecasting which consists of just linear transformations along the time dimension.

A. Lorenz System

Lorenz system is notable for having chaotic solutions for certain parameter values and initial conditions, especially the butterfly-shaped Lorenz attractor. It’s a good example showing that chaotic systems can be completely deterministic and yet still be inherently impractical or even impossible to predict over longer periods. A Lorenz-63 system 1 has the following form:

dxdt=σ(yx)dydt=x(ρz)ydzdt=xyβz (12)

Here we use the classical setting of ρ=28, σ=10, β=83 with Gaussian noise ϵN0,0.012×I. To test if the models can properly reconstruct the dynamics of the system, we report the mean square error at the first 64, 128, and 200 steps separately repeated 100 times with different dataset divisions and model initializations. Since the dimension of Lorenz system is not high (three dimensions), the hidden dimension is all set to 32. For PLRNN and dendPLRNN, the teacher force interval is set to 25 and the number of bases of dendPLRNN is 20 same as the original paper. The threshold of SINDy is set to 0.05. Since the Lorenz system doesn’t relate to change of dynamics with different states, we take the task-embedding component out of our BRICK model in this synthetic dataset.

Fig. 2 shows the predicted trajectories of different methods on the same initial condition with 1000 prediction steps. As shown in Fig. 2, most of the methods reconstruct the Lorenz-63 system reasonably except EDMD. Specifically, BRICK produces the most faithful reconstruction, showing a clear butterfly-like structure with well-defined lobes and smooth trajectories, matching closely with typical Lorenz attractor characteristics, revealing that it is possible to recover the system dynamics by combining Koopman operator theory with control input. DendPLRNN and PLRNN show similar overall structure but with some distortions - their trajectories are less smooth and the lobes appear slightly warped compared to the expected Lorenz pattern. The Transformer method generates dense trajectories but with substantial noise and irregularities in the attractor structure because it highly relies on precise observation. SINDy and EDMD show less faithful results than other deep models.

Fig. 2:

Fig. 2:

Comparison of prediction methods on the Lorenz-63 system over 1,000 time steps. Each method started from the same initial condition, with the blue line showing the ground truth trajectory and colored points indicating predictions from different methods.

Table II displays the mean square error (MSE) of different prediction steps for each method. Our BRICK method achieves 2nd place in short-range prediction, 1st place in mid-range prediction, and 2nd place in long-range prediction, respectively. It seems that SINDy shows similar performance (3rd place in short-range prediction, 2nd place in mid-range prediction, and 1st in long-range prediction) in terms of MSE. However, our method uncovers the attractor structure with significantly greater accuracy compared to SINDy.

TABLE II:

Mean squared error (MSE) comparison of different prediction methods on the Lorenz-63 system. Results are shown for predictions over 64, 128, and 200 time steps, reported as mean ± standard deviation.

MSE

Model 64 steps 128 steps 200 steps
EDMD 2.247±0.150 1.363±0.117 1.138±0.107
SINDy 0.054±0.020 0.101±0.032 0.202±0.045

Transformer 0.090±0.030 0.321±0.057 0.438±0.066
PLRNN 0.056±0.008 0.367±0.019 0.666±0.026
dendPLRNN 0.021±0.005 0.190±0.014 0.430±0.021

Ours (BRICK) 0.042±0.006 0.082±0.010 0.388±0.020

B. Performance Evaluation on fMRI Dataset

1). Data Description:

(1) The Autism Brain Imaging Data Exchange (ABIDE) [35] provides anonymous resting-state functional magnetic resonance imaging (rs-fMRI) data collected from 17 international sites. Our analysis used brain networks and BOLD signal from 1009 subjects, including 516 Autism spectrum disorder (ASD) patients (51.14% positive cases). Brain regions were defined using the Craddock 200 atlas [36]. (2) Human Connectome Project-Aging (HCP-A) dataset [37] includes data from 717 subjects, encompassing both fMRI (4,846 time series) and Diffusion Weighted Imaging (DWI) (717) scans. This rich collection facilitates in-depth analyses of both functional and structural connectivity. The HCP-A dataset includes data from four brain tasks associated with memory: VISMOTOR, CARIT, FACENAME, and resting state. Each fMRI scan consists of 125 time points. In the following experiments on HCP-A, these tasks are treated as distinct categories in a four-class classification problem. We partition each into 116 regions using AAL atlas [38]. Thus, structural connectivity is a 116 × 116 matrix where each element is quantified by the number of nerve fibers linking two brain regions [39].

2). Evaluation Metrics:

The metrics of this experiment are as follows : (1) Mean Square Error (MSE) of 10 steps predicted BOLD signal. (2) The approximated Kullback-Leibler divergence DKL between true distribution pθ(x) and posterior distribution qψxg0,u (See Appendix 6.2 in [18]). KL divergence evaluates how well the model captures the underlying dynamics of brain states, accounting for natural variability in neural activity patterns. (3) The root mean square error (RMSE) between the reconstructed functional connectivities and the ground truth. The resulting RMSE measures preservation of network-level relationships between brain regions, which is crucial for understanding brain organization and function.

3). Implementations:

For experiments, we use an AdamW [40] optimizer with CosineAnnealing Schedule [41]. We use grid search to select the hyperparameters. The number of transformer layers is searched within [1, 2, 4] and found 2 layers provided the best performance. We test learning rate in [1e-4, 5e-4, 1e-3, 5e-3] with with 1e-3 providing fastest stable convergence. The batch size is 512. All models are trained for 1000 epochs, and the epoch with the highest performance on the validation set is used for performance comparison on the test set.

4). Comparison with Baselines:

Table III shows a comprehensive evaluation for the effectiveness of brain dynamics modeling approaches. First notice that the performance on ABIDE is worse than that of HCP-A because the fMRI images in ABIDE are collected from multiple centres while HCP-A data are collected in a highly standardized way. Traditional methods like EDMD and SINDy, while mathematically elegant in their approach to approximating dynamical systems through linear mappings and sparse identification, show limited success in capturing the complex, nonlinear nature of brain dynamics. One should notice that SINDy shows the weakest performance because it’s intractable to use polynomial features for such high dimensional systems (e.g., with the highest order of 3, the number of features scales up to the level of 106). Moreover, SINDy is sensitive to the noise of states, making it less accurate.

TABLE III:

Performance Comparison Across Methods and Datasets (IC means Initial Condition module, C means Control module)

Method ABIDE HCP-A

MSE KL RMSE MSE KL RMSE
EDMD 0.841 ± 0.020 85.976 ± 0.383 0.319 ± 0.003 0.630 ± 0.015 53.715 ± 0.147 0.543 ± 0.001
SINDy 1.735 ± 0.036 152.172 ± 1.723 0.361 ± 0.003 1.097 ± 0.019 102.748 ± 0.917 0.574 ± 0.003
PLRNN 0.994 ± 0.015 65.330 ± 0.613 0.404 ± 0.012 0.673 ± 0.013 23.358 ± 0.346 0.202 ± 0.006
dendPLRNN 1.006 ± 0.008 63.329 ± 2.056 0.455 ± 0.016 0.654 ± 0.008 22.860 ± 0.315 0.198 ± 0.004
Transformer 0.477 ± 0.005 30.924 ± 0.538 0.193 ± 0.003 0.179 ± 0.002 5.170 ± 0.103 0.079 ± 0.001
Dlinear 0.271 ± 0.015 23.012 ± 0.338 0.119 ± 0.006 0.209 ± 0.008 6.721 ± 0.108 0.107 ± 0.004
LRU 0.335 ± 0.013 26.824 ± 0.448 0.133 ± 0.003 0.353 ± 0.012 7.090 ± 0.114 0.098 ± 0.005

BRICK(w/o C w/o IC) 0.428 ± 0.006 30.164 ± 0.609 0.145 ± 0.004 0.159 ± 0.009 5.278 ± 0.278 0.080 ± 0.004
BRICK(w/o IC) 0.428 ± 0.012 29.320 ± 0.846 0.147 ± 0.002 0.083 ± 0.004 4.298 ± 0.167 0.062 ± 0.003
BRICK(w/o C) 0.309 ± 0.008 25.788 ± 0.612 0.120 ± 0.005 0.153 ± 0.013 4.814 ± 0.232 0.072 ± 0.004
BRICK 0.168 ± 0.013 14.118 ± 0.996 0.081 ± 0.005 0.077 ± 0.001 4.637 ± 0.054 0.073 ± 0.001

The neural network-based approaches (PLRNN and dend-PLRNN) demonstrate improved performance, likely due to their ability to model nonlinear relationships based on a nonlinear state space model. However, these methods still face challenges in capturing long-range temporal dependencies in brain signals. Compared to their superior performance in Lorenz system, they struggle in the complex brain dynamics identification tasks. One possible reason is the limited number of parameters, leading to under-fitting. The significant performance leap achieved by the Transformer architecture highlights the importance of attention mechanisms in modelling brain dynamics, as they can effectively capture both local and global temporal relationships in neural activity patterns. Though being simple, the linear methods, DLinear and LRU, also showed consistent comparable performance on both datasets.

In contrast, our BRICK method achieves best performance, particularly in its full setting, suggests that combining controlled dynamics modeling with spatio-temporal measurements is crucial for accurately capturing brain state transitions. The superior performance of BRICKis evident not only in the aggregate metrics (Table III) but also in its ability to sustain prediction accuracy over time (Fig. 3) and reconstruct intricate connectivity patterns (Fig. 4). The prediction performance by SINDy and EDMD declines rapidly with increasing time steps, indicating their limitations in handling high-dimensional, nonlinear dynamics noted in the original analysis.

Fig. 3:

Fig. 3:

Comparison of prediction errors across different methods on the ABIDE and HCP-A datasets. The plots show Mean Squared Error (MSE) over increasing prediction steps, with lower values indicating better performance. Each color represents a different prediction method.

Fig. 4:

Fig. 4:

Comparison of reconstructed functional connectivity matrices across different methods. The top row shows the ground truth connectivity pattern alongside results from EDMD, DLinear and PLRNN methods, while the bottom row displays reconstructions from our proposed method, Transformer, LRU and dendPLRNN. The color scale ranges from −0.6 (dark blue) to 1.0 (dark red), representing correlation strengths.

The connectivity matrices provide additional insight into why neural network-based approaches perform better - they’re able to capture the underlying structure of brain networks, though with varying degrees of accuracy. BRICK success in both temporal prediction stability and connectivity reconstruction suggests that its architecture effectively combines the benefits of controlled dynamics modeling with the ability to capture complex spatio-temporal patterns in brain activity.

5). Ablation Study:

Table III also shows the performance of BRICKin different ablation settings by taking out the control module (denoted by w/o C) and initial condition (denoted by w/o IC). The base dynamics prototype (BRICK w/o C & w/o IC) establishes a solid foundation with its Koopman operator approach and fixed control module across subjects and tasks, showing initial improvements over existing counterpart methods and similar performance with Transformer. On top of this, the control module (BRICK w/o IC) demonstrates particular strength in task-related scenarios, as evidenced by its dramatic improvement of KL divergence from 5.278 to 4.298 on the HCP-A dataset, while showing minimal impact on resting-state ABIDE data. Since the setup of initial conditions provides sophisticated spatio-temporal measurements and functional connectivity modeling, our full method without the control module (BRICK w/o C) still yields promising results for resting-state data, where BRICK w/o C has reduced ABIDE’s KL divergence from 30.164 to 25.788, while having a slightly negative impact on task-based HCP-A data.

The contribution of each learning module in BRICK can be explained by examining the fundamental nature of each dataset and the modules’ design principles. The control module’s limited contribution on ABIDE’s resting-state data stems from the spontaneous, unstructured nature of brain activity during rest, where there are no clear task-related state transitions to regulate or control, making the task-based control module and classifier component less effective. Conversely, the initial condition module’s reduced effectiveness on HCP-A’s task-based data is because task-driven brain states are already highly structured and constrained by task requirements, making additional modeling of functional connectivity and spatial relationships less crucial, and potentially introducing unnecessary complexity through stochastic initial condition modelling. This complementary pattern underscores a key principle in brain dynamics modeling: the importance of aligning modeling components with the underlying characteristics of the neural activity being studied, regardless of whether it is a more structured, task-driven nature of HCP-A data or the more variable, spontaneous nature of resting-state ABIDE data.

Finally, the full BRICK setting achieves the best overall performance by leveraging this complementarity - the control module’s ability to handle task-specific variations combines synergistically with the initial condition module’s capacity to capture underlying brain state representations, resulting in optimal performance on ABIDE dataset and the second-best performance of HCP-A. This pattern suggests that comprehensive brain dynamics modeling requires both task-specific control mechanisms and sophisticated state representation capabilities, with different components becoming more or less crucial depending on the nature of the neural activity being analyzed.

6). Parameter Sensitivity Analysis:

We also conduct experiments on the hyperparameter sensitivity on the latent dimension and the loss balancing parameter β.

As shown in Fig. 6 (a)-(b), we evaluated model performance by varying the latent dimension multiplier from 1× to 3× the original dimension. For the ABIDE dataset, increasing latent dimensions initially improves performance across all metrics. In contrast, the HCP-A dataset demonstrates more varied responses to increased latent dimensions. All metrics exhibit a U-shaped curve with optimal performance at 2×. These findings indicate that different brain dynamics tasks may require specific optimal dimensionality settings, with task-related data (HCP-A) showing greater sensitivity to dimensional tuning than resting-state data (ABIDE). Overall, the 2× multiplier represents a balanced configuration that achieves strong performance across both datasets and all evaluation metrics.

Fig. 6:

Fig. 6:

Hyperparameter sensitivity evaluation. (a) The effect of latent dimension on ABIDE dataset. (b) The effect of latent dimension on HCP-A dataset. (c) The effect of loss balance parameter β on ABIDE dataset. (d) The effect of loss balance parameter β on HCP-A dataset. Each experiment is repeated with 10 different random seeds.

The parameter sensitivity analysis for the balance factor β reveals robust performance across a wide range of values for both datasets. As shown in Fig. 6 (c)-(d), we evaluated the impact of β on model performance by varying its value from 0.1 to 0.9. All metrics demonstrate remarkable stability across the entire range of β values. This stability suggests that our model architecture effectively balances the classification and reconstruction objectives without requiring precise tuning of the β hyperparameter.

C. Interpretation

In this part, we aim to interpret the learned system, especially the control module. We first randomly draw 20 samples from different tasks separately. Next we sum up and normalize the absolute value of each row of the control matrix, which measures the intensity of the response generated by the region given the same input u. Then we map and visualize the top weighted region of different tasks.

As shown in Fig. 5, we can identify top-10 critical brain regions associated with each task, aligning well with their respective cognitive functions. For instance, the CARIT task demonstrates strong activation in the temporal region and prefrontal cortex, indicating integrated processing between auditory processing, semantic memory, and executive control functions. The FACENAME task shows significant activation in the prefrontal cortex and temporal lobe, reflecting its involvement in memory encoding, retrieval, and facial recognition processes. The VISMOTOR task predominantly activates the motor regions and visual regions, emphasizing its role in visuomotor integration. The consistent activation patterns across subjects demonstrate both the specificity of regional recruitment for different cognitive demands and the integration of distributed networks for complex task execution. These findings support the functional specialization of brain networks for specific tasks and provide a foundation for further investigation into the interactions between these regions.

Fig. 5:

Fig. 5:

Critical brain regions (top-10) linked to three brain tasks: CARIT (left), FACENAME (middle), and VISMOTOR (right).

V. Discussion

In this section, we aim to discuss both the potential limitations of the current method as well as the potential applications.

One of the major considerations of our approach is the fitness of selected/learnt measurements. The quality of linearization depends critically on choosing appropriate observable functions, which is often problem-specific and non-trivial. Here we use the biologically inspired measurements and can only validate them empirically with no strong mathematical guarantee. Another potential limitation is that finite-dimensional approximations may not “close” (i.e., remain within the Koopman invariant subspace). With measurements that can not form a Koopman invariant subspace, the prediction error will grow fast in long-term prediction. For both of the limitations, we use the proposed control module to regulate to avoid quick divergence. However, this method has no rigorous guarantee to eliminate the limitations.

We further discuss the potential applications of BRICK. First, the current task-specific control module could be adapted to characterize disorder-specific deviations in brain dynamics, potentially serving as diagnostic biomarkers for conditions like Alzheimer’s disease, schizophrenia, or autism spectrum disorder. Second, our framework could be extended to create “disease fingerprints” by analyzing how the Koopman operator and control matrices differ between healthy controls and patient populations. Specifically, we can learn two Koopman matrices Knc, Kpatient from healthy controls and patient populations to study their properties like eigenvalue distribution.

VI. Conclusion

This work presents BRICK (BRain dynamics Identification using Control and Koopman), a novel deep learning framework for understanding brain functional dynamics through the lens of controlled dynamical systems. Our approach addresses several fundamental challenges in modeling complex brain dynamics: First, we leverage Koopman operator theory to transform non-linear brain dynamics into a linear evolution in a higher-dimensional measurement space, providing both mathematical tractability and the ability to capture complex nonlinear behaviors. This linearization enables systematic analysis while preserving the rich dynamics of neural processes. Second, we introduce a sophisticated measurement generation scheme that combines functional connectivity patterns with spatial information through permutation-equivariant neural networks. This approach generates biologically meaningful measurements that form an effective Koopman invariant subspace, capturing both local and global patterns in brain activity. Third, we develop a task-related control module that adapts to individual differences while maintaining consistent underlying dynamics across subjects. This innovation allows the model to account for both shared neural mechanisms and individual variations in brain function.

Our comprehensive experiments on both synthetic data (Lorenz system) and real fMRI datasets (ABIDE and HCP-A) demonstrate BRICK’s superior performance compared to existing methods. The ablation studies reveal the complementary nature of our model’s components, with the control module proving particularly effective for task-related dynamics and the initial condition module excelling in capturing resting-state patterns. This suggests that comprehensive brain dynamics modeling requires both task-specific control mechanisms and sophisticated state representation capabilities.

Acknowledgments

This work was supported by the National Institutes of Health AG070701, AG073927, AG068399, AG084375, and Foundation of Hope.

Appendix

A. Implementation Details about BRICK

1). Neural Koopman Operator:

During training, we do not directly optimize the Koopman operator K. Instead, we utilize the eigendecomposition to enhance the training efficiency. Specifically, we first denote u˜0=g0 and u˜t=Cut for simplicity. Then we can notice that gt+1=i=0t+1Kiu˜t+1i. By eigendecomposition K=PΛP1, where P and Λ are complex. By left multiplying P1 on both sides, we can get g¯t+1=i=0t+1Λiu¯t+1i, where g¯t=P1gt and u¯t=P1u˜t. For this form of dynamical system, we can use a mature algorithm termed Parallel Scan[1] to reduce the running time complexity to O(logt). Finally, when it goes back the original state space by Eq. 2 x^t+1=Wxgt+1, we can reformulate it as x^t+1=W¯xg¯t+1, where W¯x=WxP.

2). Control Module:

We first encode observations into embeddings using transformer encoders, which effectively capture temporal dependencies in the BOLD signals. These embeddings are then processed through a batch normalization layer and fed into both: (1) A classification head that predicts task states. (2) A control gate generation function using tanh activation. This implements the diagonal control matrix C described in the paper. We chose this approach for parameter efficiency and to prevent overfitting, as mentioned in Sec. III-C.

B. Loss duing Training

We plot the training loss and validation loss for the first 500 epochs on Fig. 7, yielding a smooth optimization dynamics of BRICK.

Fig. 7:

Fig. 7:

Training loss and validation loss during training. The losses are processed by log() function. The first 500 epochs are plotted for better visualization.

Footnotes

1

The Lorenz-63 system is a simplified mathematical model of atmospheric convection developed by Edward Lorenz in 1963. It is a set of three coupled, non-linear ordinary differential equations that describe the evolution of three variables over time. The system is often used as a paradigm for chaotic systems in dynamical systems theory.

Contributor Information

Zhixuan Zhou, Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA; Department of Computer Science. Department of Statistics and Operations Research (STOR), Carolina Institute for Developmental Disabilities, and the UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA..

Tingting Dan, Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.

Guorong Wu, Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA; Department of Computer Science. Department of Statistics and Operations Research (STOR), Carolina Institute for Developmental Disabilities, and the UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA..

References

  • [1].Bassett DS and Sporns O, “Network neuroscience,” Nature neuroscience, vol. 20, no. 3, pp. 353–364, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Bressler SL and Menon V, “Large-scale brain networks in cognition: emerging methods and principles,” Trends in cognitive sciences, vol. 14, no. 6, pp. 277–290, 2010. [DOI] [PubMed] [Google Scholar]
  • [3].Telesford QK, Simpson SL, Burdette JH, Hayasaka S, and Laurienti PJ, “The brain as a complex system: using network science as a tool for understanding the brain,” Brain connectivity, vol. 1, no. 4, pp. 295–308, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Botvinick MM and Cohen JD, “The computational and neural basis of cognitive control: charted territory and new frontiers,” Cognitive science, vol. 38, no. 6, pp. 1249–1285, 2014. [DOI] [PubMed] [Google Scholar]
  • [5].McGowan AL, Parkes L, He X, Stanoi O, Kang Y, Lomax S, Jovanova M, Mucha PJ, Ochsner KN, Falk EB, et al. , “Controllability of structural brain networks and the waxing and waning of negative affect in daily life,” Biological Psychiatry Global Open Science, vol. 2, no. 4, pp. 432–439, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Gu S, Pasqualetti F, Cieslak M, Telesford QK, Yu AB, Kahn AE, Medaglia JD, Vettel JM, Miller MB, Grafton ST, et al. , “Controllability of structural brain networks,” Nature communications, vol. 6, no. 1, p. 8414, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Kim TD, Luo TZ, Pillow JW, and Brody CD, “Inferring latent dynamics underlying neural population activity via neural differential equations,” in International Conference on Machine Learning, pp. 5551–5561, PMLR, 2021. [Google Scholar]
  • [8].Pandarinath C, O’Shea DJ, Collins J, Jozefowicz R, Stavisky SD, Kao JC, Trautmann EM, Kaufman MT, Ryu SI, Hochberg LR, et al. , “Inferring single-trial neural population dynamics using sequential auto-encoders,” Nature methods, vol. 15, no. 10, pp. 805–815, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Koopman BO, “Hamiltonian systems and transformation in hilbert space,” Proceedings of the National Academy of Sciences, vol. 17, no. 5, pp. 315–318, 1931. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Koopman BO and Neumann J. v., “Dynamical systems of continuous spectra,” Proceedings of the National Academy of Sciences, vol. 18, no. 3, pp. 255–263, 1932. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Orvieto A, Smith SL, Gu A, Fernando A, Gulcehre C, Pascanu R, and De S, “Resurrecting recurrent neural networks for long sequences,” in International Conference on Machine Learning, pp. 26670–26698, PMLR, 2023. [Google Scholar]
  • [12].Mezić I, “Spectral properties of dynamical systems, model reduction and decompositions,” Nonlinear Dynamics, vol. 41, no. 1–3, pp. 309–325, 2005. [Google Scholar]
  • [13].Nathan Kutz J, Proctor JL, and Brunton SL, “Applied koopman theory for partial differential equations and data-driven modeling of spatio-temporal systems,” Complexity, vol. 2018, no. 1, p. 6010634, 2018. [Google Scholar]
  • [14].Wang R, Dong Y, Arik SÖ, and Yu R, “Koopman neural forecaster for time series with temporal distribution shifts,” arXiv preprint arXiv:2210.03675, 2022. [Google Scholar]
  • [15].Williams MO, Kevrekidis IG, and Rowley CW, “A data–driven approximation of the koopman operator: Extending dynamic mode decomposition,” Journal of Nonlinear Science, vol. 25, pp. 1307–1346, 2015. [Google Scholar]
  • [16].Brunton SL, Proctor JL, and Kutz JN, “Discovering governing equations from data by sparse identification of nonlinear dynamical systems,” Proceedings of the national academy of sciences, vol. 113, no. 15, pp. 3932–3937, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Koppe G, Toutounji H, Kirsch P, Lis S, and Durstewitz D, “Identifying nonlinear dynamical systems via generative recurrent neural networks with applications to fmri,” PLoS computational biology, vol. 15, no. 8, p. e1007263, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Brenner M, Hess F, Mikhaeil JM, Bereska LF, Monfared Z, Kuo P-C, and Durstewitz D, “Tractable dendritic rnns for reconstructing nonlinear dynamical systems,” in International conference on machine learning, pp. 2292–2320, Pmlr, 2022. [Google Scholar]
  • [19].Bengio S, Vinyals O, Jaitly N, and Shazeer N, “Scheduled sampling for sequence prediction with recurrent neural networks,” Advances in neural information processing systems, vol. 28, 2015. [Google Scholar]
  • [20].Gallos IK, Lehmberg D, Dietrich F, and Siettos C, “Data-driven modelling of brain activity using neural networks, diffusion maps, and the koopman operator,” Chaos: An Interdisciplinary Journal of Nonlinear Science, vol. 34, no. 1, 2024. [DOI] [PubMed] [Google Scholar]
  • [21].Marrouch N, Slawinska J, Giannakis D, and Read HL, “Data-driven koopman operator approach for computational neuroscience,” Annals of Mathematics and Artificial Intelligence, vol. 88, no. 11–12, pp. 1155–1173, 2020. [Google Scholar]
  • [22].Turja MA, Styner M, and Wu G, “Deepgraphdmd: Interpretable spatio-temporal decomposition of non-linear functional brain network dynamics,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 358–368, Springer, 2023. [Google Scholar]
  • [23].Gupta G, Xiao X, and Bogdan P, “Multiwavelet-based operator learning for differential equations,” Advances in neural information processing systems, vol. 34, pp. 24048–24062, 2021. [Google Scholar]
  • [24].Xiao X, Cao D, Yang R, Gupta G, Liu G, Yin C, Balan R, and Bogdan P, “Coupled multiwavelet operator learning for coupled differential equations,” in The Eleventh International Conference on Learning Representations, 2022. [Google Scholar]
  • [25].Gupta G, Xiao X, Balan R, and Bogdan P, “Non-linear operator approximations for initial value problems,” in International Conference on Learning Representations (ICLR), 2022. [Google Scholar]
  • [26].Brunton SL, Brunton BW, Proctor JL, and Kutz JN, “Koopman invariant subspaces and finite linear representations of nonlinear dynamical systems for control,” PloS one, vol. 11, no. 2, p. e0150171, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Brunton SL, Brunton BW, Proctor JL, Kaiser E, and Kutz JN, “Chaos as an intermittently forced linear system,” Nature communications, vol. 8, no. 1, p. 19, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].Van Den Heuvel MP and Pol HEH, “Exploring the brain network: a review on resting-state fmri functional connectivity,” European neuropsychopharmacology, vol. 20, no. 8, pp. 519–534, 2010. [DOI] [PubMed] [Google Scholar]
  • [29].Schober P, Boer C, and Schwarte LA, “Correlation coefficients: appropriate use and interpretation,” Anesthesia & analgesia, vol. 126, no. 5, pp. 1763–1768, 2018. [DOI] [PubMed] [Google Scholar]
  • [30].Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, and Polosukhin I, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017. [Google Scholar]
  • [31].Gilmer J, Schoenholz SS, Riley PF, Vinyals O, and Dahl GE, “Neural message passing for quantum chemistry,” in International conference on machine learning, pp. 1263–1272, PMLR, 2017. [Google Scholar]
  • [32].Kipf TN and Welling M, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016. [Google Scholar]
  • [33].Geneva N. and Zabaras N, “Transformers for modeling physical systems,” Neural Networks, vol. 146, pp. 272–289, 2022. [DOI] [PubMed] [Google Scholar]
  • [34].Zeng A, Chen M, Zhang L, and Xu Q, “Are transformers effective for time series forecasting?,” in Proceedings of the AAAI conference on artificial intelligence, vol. 37, pp. 11121–11128, 2023. [Google Scholar]
  • [35].Craddock C, Benhajali Y, Chu C, Chouinard F, Evans A, Jakab A, Khundrakpam BS, Lewis JD, Li Q, Milham M, et al. , “The neuro bureau preprocessing initiative: open sharing of preprocessed neuroimaging data and derivatives,” Frontiers in Neuroinformatics, vol. 7, no. 27, p. 5, 2013.23658544 [Google Scholar]
  • [36].Craddock RC, James GA, Holtzheimer III PE, Hu XP, and Mayberg HS, “A whole brain fmri atlas generated via spatially constrained spectral clustering,” Human brain mapping, vol. 33, no. 8, pp. 1914–1928, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Bookheimer SY, Salat DH, Terpstra M, Ances BM, Barch DM, Buckner RL, Burgess GC, Curtiss SW, Diaz-Santos M, Elam JS, et al. , “The lifespan human connectome project in aging: an overview,” Neuroimage, vol. 185, pp. 335–348, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [38].Tzourio-Mazoyer N, Landeau B, Papathanassiou D, Crivello F, Etard O, Delcroix N, Mazoyer B, and Joliot M, “Automated anatomical labeling of activations in spm using a macroscopic anatomical parcellation of the mni mri single-subject brain,” Neuroimage, vol. 15, no. 1, pp. 273–289, 2002. [DOI] [PubMed] [Google Scholar]
  • [39].Dan T, Wei Z, Kim WH, and Wu G, “Exploring the enigma of neural dynamics through a scattering-transform mixer landscape for riemannian manifold,” in Forty-first International Conference on Machine Learning, 2024. [Google Scholar]
  • [40].Loshchilov I, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017. [Google Scholar]
  • [41].Loshchilov I. and Hutter F, “Sgdr: Stochastic gradient descent with warm restarts,” arXiv preprint arXiv:1608.03983, 2016. [Google Scholar]

RESOURCES