Explainable Multimodal Deep Dictionary Learning to Capture Developmental Differences from Three fMRI Paradigms

Lan Yang; Chen Qiao; Huiyu Zhou; Vince D Calhoun; Julia M Stephen; Tony W Wilson; Yuping Wang

doi:10.1109/TBME.2023.3244921

. Author manuscript; available in PMC: 2024 Aug 1.

Published in final edited form as: IEEE Trans Biomed Eng. 2023 Jul 18;70(8):2404–2415. doi: 10.1109/TBME.2023.3244921

Explainable Multimodal Deep Dictionary Learning to Capture Developmental Differences from Three fMRI Paradigms

Lan Yang ¹, Chen Qiao ², Huiyu Zhou ³, Vince D Calhoun ⁴, Julia M Stephen ⁵, Tony W Wilson ⁶, Yuping Wang ⁷

PMCID: PMC11045007 NIHMSID: NIHMS1918682 PMID: 37022875

Abstract

Objective:

Multimodal-based methods show great potential for neuroscience studies by integrating complementary information. There has been less multimodal work focussed on brain developmental changes.

Methods:

We propose an explainable multimodal deep dictionary learning method to uncover both the commonality and specificity of different modalities, which learns the shared dictionary and the modality-specific sparse representations based on the multimodal data and their encodings of a sparse deep autoencoder.

Results:

By regarding three fMRI paradigms collected during two tasks and resting state as modalities, we apply the proposed method on multimodal data to identify the brain developmental differences. The results show that the proposed model can not only achieve better performance in reconstruction, but also yield age-related differences in reoccurring patterns. Specifically, both children and young adults prefer to switch among states during two tasks while staying within a particular state during rest, but the difference is that children possess more diffuse functional connectivity patterns while young adults have more focused functional connectivity patterns.

Conclusion and Significance:

To uncover the commonality and specificity of three fMRI paradigms to developmental differences, multimodal data and their encodings are used to train the shared dictionary and the modality-specific sparse representations. Identifying brain network differences helps to understand how the neural circuits and brain networks form and develop with age.

Index Terms—: Explainablity, Multimodal dictionary learning, Dynamic functional connectivity, Brain development

I. Introduction

NORMAL brain development is a complex process, from the establishment of basic cognitive functions in childhood to the gradual maturity of more complex self-regulatory functions throughout adolescence [1]–[3]. Functional magnetic resonance imaging (fMRI) can capture hemodynamic responses to neuronal activities by measuring the blood oxygenation level-dependent (BOLD) signal, based on which the changes in neural interaction and integration between functionally interconnected regions with development can be revealed [4]. Compared with BOLD signals, dynamic functional connectivity (dFC) measured by a sliding window approach can reflect time-varying dependencies between spatially separated brain regions. It helps to quantify the changes of correlation strength between functional activities of paired brain regions over time. Thus, there has been growing interest in identification of the recurring whole-brain functional connectivity patterns (i.e., states) based on dFC recently. These studies aim to divide the whole-brain dFC profiles into distinct states observed reliably across subjects throughout the fMRI scans [5]–[8]. It enables us to investigate the differences of states related to brain development, capture the transition mechanism among these states, and provide insights into neural brain dynamics from the perspective of functional connectivity [4], [5].

Compared with single modality methods for fMRI analysis, multimodal-based methods can take advantage of complementary information provided by different modalities. Studies have shown that integrating the multimodal prior or combining the complementary information from diverse modalities can promote model enhancement and diagnosis [9]–[11]. Many methods have been extended for multimodal data integration including multitask learning, linear regression, neural network, support vector machine, and dictionary learning [9]–[14]. Due to the ability to reduce dimensionality and identify the reoccurring patterns of dFC [4], multimodal dictionary learning methods have attracted considerable attention. For example, Li et al. [11] proposed a multimodal discriminative dictionary learning (mSCDDL) method based on a weighted combination strategy, and further applied it to fuse information from structural magnetic resonance imaging and fluorodeoxyglucose positron emission tomography for Alzheimer's disease classification. In [15], a $ℓ_{1}$ -norm regularized dictionary learning approach was proposed to identify the epilepsy-related dFC states, where the time courses representative of epileptic activity extracted by electroencephalogram are incorporated into the fMRI for dFC state analysis. In [16], multimodal dictionary learning was applied to the diagnosis of schizophrenia, which embeds the correlation information of multimodal data into the learning model. Additionally, to achieve the nonlinearity or higher-level features of data, Li et al. [10] improved the mSCDDL with the multi-feature kernel trick to obtain the nonlinear representations of data. D'Souza et al. [17] proposed a framework for Autism Spectrum Disorder's diagnosis, which couples a structurally-regularized dynamic dictionary learning model (sr-DDL) with a deep network to predict behavioral scores, where the dFCs of fMRI were decomposed by sr-DDL while constraining the decomposition by the FCs of diffusion tensor imaging.

Of particular note, the aforementioned methods either fail to uncover both commonality and specificity of different modalities, or overlook the nonlinear higher-level features of data, or have difficulty in explainability (i.e., it fails to identify the reoccurring patterns of dFC, or brain regions and FCs related to development or disease). To address these issues, we propose an explainable multimodal deep dictionary learning (EMDDL) method, which connects the multimodal dictionary learning in the original space and the encoding space through a sparse deep autoencoder (sDAE). Within this framework, all modalities share the same dictionary to reveal the inherent commonality. To achieve the specificity of each modality, Fisher cost is used to constrain the sparse representations due to its ability to learn the modality-specific features by avoiding the overlap of neighboring pairs between different modalities. Moreover, the shared dictionary and the modality-specific sparse representations are learned based on the multimodal data and their encodings of the sDAE. In this way, multimodal dictionary learning can attain the nonlinear higher-level features while reconstructing the original data for identifying the reoccurring patterns or functional connectivity related to development. To maintain the complex relationships among subjects, a hypergraph Laplacian regularization is used, which helps to enhance the learning ability through prior knowledge.

We apply EMDDL to the multimodal data from Philadelphia Neurodevelopmental Cohort (PNC) to recognize the developmental differences between children and young adults, where the three fMRI paradigms collected during two tasks and resting state are regarded as modalities. We found that both children and young adults tend to switch frequently among states during two tasks and stay within a particular state during rest. The main difference is that children have more diffuse functional connectivity patterns while young adults possess more focused functional connectivity patterns under three fMRI paradigms. Besides, the differences in functional connectivity between children and young adults are mainly related to information processing, cognition, emotion, and working memory under three fMRI paradigms.

II. Preliminary Work

In this section, some preliminary work is presented including hypergraph learning to preserve the higher-order relation- ships among subjects and Fisher cost to extract modalityspecific features.

A. Hypergraph Learning

Given that the traditional graph learning loses information inevitably by squeezing the complex relationships into pairwise ones, hypergraph has been widely applied to identify the high-order relationships among subjects [18], [19]. Generally, a hypergraph $G = (V, E, W)$ consists of three parts, namely, the vertex set $V = \{V_{i} ∣ i = 1, 2, \dots, N_{v}\}$ , the hyperedge set $E = \{E_{i} ∣ i = 1, 2, \dots, N_{e}\}$ and the hyperedge weight $W = \{W_{i} ∣ i = 1, 2, \dots, N_{e}\}$ . To represent the relationships between hyperedges and vertices, the incidence matrix $H \in ℝ^{N_{v} \times N_{e}}$ of hypergraph $G$ is defined as

H (V_{i}, E_{j}) = \{\begin{array}{l} 1 & V_{i} \in E_{j} \\ 0 & o t h e r w i s e \end{array}

where the $(i, j)$ -th entry of $H$ denotes whether the $i$ -th vertex belong to the $j$ -th hyperedge. Based on the incidence matrix $H$ , the degree of the $i$ -th vertex $d_{V_{i}} = \sum_{E_{j} \in E} W_{j} H (V_{i}, E_{j})$ and the degree of the $i$ -th hyperedge $d_{E_{i}} = \sum_{V_{j} \in V} H (V_{j}, E_{i})$ can be obtained. Then the diagonal matrices $D_{v} \in ℝ^{N_{v} \times N_{v}}$ and $D_{e} \in ℝ^{N_{e} \times N_{e}}$ are composed of the degree of all vertices and hyperedges respectively. Specifically, the $i$ -th diagonal element of $D_{v}$ and $D_{e}$ are $d_{V_{i}}$ and $d_{E_{i}}$ respectively.

To construct a hypergraph, the $k$ nearest neighbor strategy is usually applied because the geometric structure relationship among data can be approximately represented by the nearest neighbor graph [18], [20]. Specifically, for a chosen vertex, the distances between the chosen vertex and other vertices are calculated, and then the $k$ nearest vertices are connected by a hyperedge. The weight of the $i$ -th hyperedge is $W_{i} = \frac{1}{k (k - 1)} \sum_{\{V_{j}, V_{l}\} \in E_{i}} exp (- \frac{{‖V_{j} - V_{l}‖}_{2}^{2}}{σ_{i}})$ , where $σ_{i} = \frac{\sum_{\{V_{j}, V_{l}\} \in E_{i}} {‖V_{j} - V_{l}‖}_{2}^{2}}{k (k - 1)}$ . To obtain the diagonal matrix $W_{h} \in ℝ^{N_{e} \times N_{e}}$ , the hyperedge weight $W_{i}$ is arrayed as the $i$ -th diagonal element of $W_{h}$ . By analogizing the definition of a simple graph Laplacian matrix [21], hypergraph Laplacian matrix is defined as

L^{h} = D_{v} - S

(1)

where $S = H W_{h} D_{e}^{- 1} H^{T}$ is the similarity matrix to define the similarity between each pair of vertices.

Compared with the traditional graph Laplacian regularization, hypergraph Laplacian regularization has the characteristics of preserving complex local geometric structure and incorporating the higher-order relationships among subjects, which are conducive to classification or clustering tasks in FC or dFC analysis [19].

B. Fisher cost

The Fisher discrimination criterion is to cluster the samples in the same modality and keep the samples in different modalities as far away from each other as possible, which helps to extract features corresponding to the specific modality [22]–[24]. Assume that the multimodal data $X = (x_{1}, x_{2}, \dots, x_{N}) \in ℝ^{p \times N}$ contains $M$ modalities with $N_{m}$ samples belonging to the $m$ -th modality $N_{m}$ and $\sum_{m = 1}^{M} N_{m} = N$ , where $p$ -dimensional vector $x_{n}$ is the $n$ -th sample of $X$ . The withinmodality scatter matrix $S_{w}$ and the between-modality scatter matrix $S_{b}$ of samples are defined as

S_{w} (X) = \sum_{m = 1}^{M} \sum_{n \in N_{m}} (x_{n} - μ_{m}) {(x_{n} - μ_{m})}^{T}

S_{b} (X) = \sum_{m = 1}^{M} N_{m} (μ_{m} - μ) {(μ_{m} - μ)}^{T}

where $μ_{m} = \frac{1}{N_{m}} \sum_{n \in N_{m}} x_{n}$ and $μ = \frac{1}{N} \sum_{n = 1}^{N} x_{n}$ are the modality mean and the overall mean respectively. Then, the Fisher cost is as follows

F (X) = t r (S_{w} (X)) - t r (S_{b} (X)) + ∥ X ∥_{F}^{2}

in which the Frobenius norm $∥ \cdot ∥_{F}$ is to ensure the convexity of the cost function [24].

To get a more concise expression and facilitate calculation [22], the Fisher cost $F (X)$ can be rewritten as

F (X) = t r (X F X^{T})

(2)

where $F = 2 I - 2 F_{1} + F_{2} \in ℝ^{N \times N}$ with $I \in ℝ^{N \times N}$ being the identity matrix, $F_{1} \in ℝ^{N \times N}$ being defined as

F_{1} (i, j) = \{\begin{array}{l} \frac{1}{N_{m}} & i, j \in N_{m} \\ 0 & o t h e r w i s e \end{array}

and $F_{2} \in ℝ^{N \times N}$ with each component of it being $1 / N$ .

III. Methodology

The details of EMDDL and the corresponding optimization algorithm are presented in this section, which can learn the shared dictionary and modality-specific sparse representations in both the original space and the encoding space.

A. Explainable Multimodal Deep Dictionary Learning

Multimodal dictionary learning methods can not only embed the high-dimensional features into low-dimensional space, but also boost learning performance with the combination of multiple modalities [12]. However, most of the existing methods either cannot simultaneously reveal the inherent commonality and specificity of different modalities, or overlook the nonlinear higher-level features of data, or have difficulty in explainability. To address these problems, we propose EMDDL which couples multimodal dictionary learning with sDAE. Specifically, by sharing the same dictionary through all modalities to capture the inherent commonality and constraining sparse representations with Fisher cost to obtain the specificity of each modality, the inherent commonality and the specificity of different modalities can be concurrently achieved in multimodal dictionary learning. Moreover, to achieve the nonlinear higher-level features of data and reconstruct the original data to identify the developmental differences in reoccurring patterns or FCs, both the shared dictionary and the modality-specific sparse representations are learned not only in the original space, but also in the encoding space of the sDAE at the same time. By alternating minimization algorithms, the sDAE, dictionary, and sparse representations can be sequentially obtained. The flowchart of EMDDL is shown in Fig. 1.

Suppose that there are $M$ modalities with $N_{m}$ samples belonging to the $m$ -th modality $N_{m}$ and the training data $X = (X_{(1)}, X_{(2)}, \dots, X_{(M)}) \in ℝ^{p \times N}$ is composed of these $M$ modalities, where $N = \sum_{m = 1}^{M} N_{m}$ and the $m$ -th modality is $X_{(m)} = (x_{1}^{(m)}, x_{2}^{(m)}, \dots, x_{N_{m}}^{(m)}) \in ℝ^{p \times N_{m}}$ . Besides, the sDAE contains $2 L + 1$ layers with $r^{(l)}$ neurons in the $l$ -th layer and $r^{(2 L - l)} = r^{(l)}$ holds for $l = 0, 1, \dots, 2 L$ .

EMDDL contains two parts including $J_{s D A E}$ and $J_{M D L}$ , where $J_{s D A E}$ is to efficiently learn the nonlinear higher-level features of data and $J_{M D L}$ is to train multimodal dictionary learning in both the original space and the encoding space. The objective function of EMDDL is defined as

min_{{\{{\tilde{W}}^{(l)}\}}_{l = 1}^{2 L}, D, V} J_{o b j} = J_{s D A E} + J_{M D L} s . t . {‖d_{k}‖}_{2}^{2} \leq 1, \forall k = 1, 2, \dots, K

(3)

where $J_{s D A E}$ and $J_{M D L}$ are defined as

J_{s D A E} = J_{r e c o n} + λ_{1} J_{K L} + λ_{2} J_{W_{F}} = \frac{1}{2 N} ∥ X^{(2 L)} - X ∥_{F}^{2} + λ_{1} \sum_{l = 1}^{2 L - 1} \sum_{j = 1}^{r^{(l)}} K L (ρ ∥ ρ_{j}^{(l)}) + \frac{λ_{2}}{2} \sum_{l = 1}^{2 L} {‖{\tilde{W}}^{(l)}‖}_{F}^{2}

(4)

J_{M D L} = J_{M D L_{O}} + J_{M D L_{E}} + λ_{3} J_{F i s h e r} + λ_{4} J_{h y p e r L} = + λ_{5} J_{V_{F}} + λ_{6} J_{V_{ℓ_{1}}} \frac{1}{2 N} ∥ X - D V ∥_{F}^{2} + \frac{1}{2 N} {‖X^{(L)} - D^{(L)} V‖}_{F}^{2} + \frac{λ_{3}}{2} t r (V H V^{T}) + \frac{λ_{4}}{2} t r (V L V^{T}) + \frac{λ_{5}}{2} ∥ V ∥_{F}^{2} + λ_{6} ∥ V ∥_{1}

(5)

where $X^{(2 L)} \in ℝ^{r^{(2 L)} \times N}$ is the reconstruction of the input data $X$ by sDAE, $D = (d_{1}, d_{2}, \dots, d_{K}) \in ℝ^{p \times K}$ is the dictionary with $K$ atoms in the original space, $X^{(L)} \in ℝ^{r^{(L)} \times N}$ and $D^{(L)} \in ℝ^{r^{(L)} \times K}$ are the encoding of $X$ and $D$ respectively (i.e., its outputs in the $L$ -th layer), $V = (V_{(1)}, V_{(2)}, \dots, V_{(M)}) \in ℝ^{K \times N}$ consists of each $V_{(m)} = (v_{1}^{(m)}, v_{2}^{(m)}, \dots, v_{N_{m}}^{(m)}) \in ℝ^{K \times N_{m}}$ being the sparse representation corresponding to the $m$ -th modality in both the original space and the encoding space. ${\tilde{W}}^{(l)} = (W^{(l)}, b^{(l)}) ≜ \{{\tilde{W}}_{i j}^{(l)}\} \in ℝ^{r^{(l)} \times r^{(l - 1)} + 1}$ for $l = 1, 2, \dots, 2 L$ , in which $W^{(l)} \in ℝ^{r^{(l)} \times r^{(l - 1)}}$ and $b^{(l)} \in ℝ^{r^{(l)}}$ are the connection weight matrix and bias of sDAE between $l$ -th layer and $(l - 1)$ -th layer respectively. As defined in Appendix I-A, Kullback-Leibler divergence $K L (ρ ∥ ρ_{j}^{(l)})$ measures the difference between two Bernoulli distributions with mean $ρ$ and $ρ_{j}^{(l)}$ , where $ρ$ is a sparsity hyperparameter and $ρ_{j}^{(l)}$ is the average activation of neuron $j$ in the $l$ -th layer of sDAE. Similar to the definition of $F$ in (2), $H \in ℝ^{N \times N}$ is given by $H = I - 2 H_{1} + H_{2}$ , where $I \in ℝ^{N \times N}$ is the identity matrix, $H_{1} \in ℝ^{N \times N}$ is defined as

H_{1} (i, j) = \{\begin{array}{l} \frac{1}{N_{m}} & i, j \in N_{m} \\ 0 & o t h e r w i s e \end{array}

and the each component of $H_{2} \in ℝ^{N \times N}$ is $1 / N$ . $L \in ℝ^{N \times N}$ consists of hypergraph Laplacian matrix of all modalities, which is defined as

L = (\begin{matrix} L_{(1)}^{h} & 0 & \dots & 0 \\ 0 & L_{(2)}^{h} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & L_{(M)}^{h} \end{matrix})

where $L_{(m)}^{h} \in ℝ^{N_{m} \times N_{m}}$ , the hypergraph Laplacian matrix of the $m$ -th modality, is defined by (1). $∥ V ∥_{1} = \sum_{n = 1}^{N} \sum_{k = 1}^{K} |V_{n k}|$ with $V_{n k}$ being the $k$ -th element of the $n$ -th column of the matrix $V$ .

In (4), $J_{r e c o n}$ is to train the sDAE by minimizing the error between original data and its reconstruction. $J_{K L}$ is to prevent overfitting of the sDAE by controlling the activation of neurons. Compared with $L_{1}$ -norm and $L_{2}$ -norm, KullbackLeibler divergence has better sparsity ability, which helps to improve model performance, and the details can be seen in Appendix I-A. $J_{W_{F}}$ is to prevent overfitting of the sDAE by controlling the weights. In (5), $J_{M D L_{O}}$ is to learn the shared dictionary of all modalities and the modality-specific sparse representations based on the original data. Meanwhile, by encoding the original data and the shared dictionary through the sDAE, $J_{M D L_{E}}$ is to achieve the multimodal dictionary learning in the encoding space for capturing the nonlinear higher-level features of data. Inspired by [25], we use the same sparse representations to synchronously characterize the local geometric relationships between data and dictionary in the original space as to characterize those between encoded data and encoded dictionary in the encoding space. In other words, our objective is to use the sparse representations to capture the intrinsic local geometric relationships between data and dictionary. It helps to learn the locality-sensitive dictionary, resulting in improved generalization ability in reconstrcution or classification. By clustering the samples within modalities and separating the samples between the modalities, $J_{F i s h e r}$ helps to learn the modality-specific representations. $J_{h y p h e r L}$ is designed to retain the complex neighborhood relationships of samples hidden in each modality. $J_{V_{F}}$ guarantees the convexity of Fisher cost and $J_{V_{ℓ_{1}}}$ is to ensure the sparsity. The constraint on dictionary atoms is to prevent sparse representation from being too small due to the large dictionary. In addition, the positive parameters $λ_{1}$ , $λ_{2}$ , $λ_{3}$ , $λ_{4}$ , $λ_{5}$ and $λ_{6}$ are used to balance the network fitting, dictionary learning and the complexity of model.

B. Optimization

The alternating minimization algorithm is applied to solve the problem (3) to optimize the parameters ${\{{\tilde{W}}^{(l)}\}}_{l = 1}^{2 L}$ , $D$ and $V$ , which contains three parts, i.e., the training of sDAE, the learning of the dictionary, and sparse representations learning.

Denote

\{\begin{cases} {\tilde{h}}_{n}^{(l)} = (h_{n}^{(l)}; 1) \\ ​ z_{n}^{(l + 1)} = {\tilde{W}}^{(l + 1)} {\tilde{h}}_{n}^{(l)} l = 0, 1, \dots, 2 L - 1 ​ \\ h_{n}^{(l + 1)} = φ (z_{n}^{(l + 1)}) \end{cases}

\{\begin{cases} {\tilde{g}}_{k}^{(l)} = (g_{k}^{(l)}; 1) ​ \\ q_{k}^{(l + 1)} = {\tilde{W}}^{(l + 1)} {\tilde{g}}_{k}^{(l)} l = 0, 1, \dots, L - 1 ​ \\ g_{k}^{(l + 1)} = φ (q_{k}^{(l + 1)}) \end{cases}

where $φ$ is a differentiable activation function which is the sigmoid function in this paper; $h_{n}^{(0)} = X_{n}$ and $g_{k}^{(0)} = d_{k}$ , where $X_{n}$ is the $n$ -th column of the multimodal data $X$ and $d_{k}$ is the $k$ -th atom of the dictionary $D$ . $X^{(l)} = (h_{1}^{(l)}, h_{2}^{(l)}, \dots, h_{N}^{(l)}) \in ℝ^{r^{(l)} \times N}$ $l = 1, 2, \dots, 2 L$ and $D^{(l)} = (g_{1}^{(l)}, g_{2}^{(l)}, \dots, g_{K}^{(l)}) \in ℝ^{r^{(l)} \times K}$ , $l = 1, 2, \dots, L$ are the outputs in the $l$ -th layer when the input data are $X$ and $D$ respectively.

1). The Training of SDAE:

To optimize the parameters of $sDAE {\{{\tilde{W}}^{(l)}\}}_{l = 1}^{2 L}$ with fixed $D$ and $V$ , problem (3) can be rewritten as

min_{{\{{\tilde{W}}^{(l)}\}}_{l = 1}^{2 L}} \frac{1}{2 N} ({‖X^{(2 L)} - X‖}_{F}^{2} + {‖X^{(L)} - D^{(L)} V‖}_{F}^{2}) + λ_{1} \sum_{l = 1}^{2 L - 1} \sum_{j = 1}^{r^{(l)}} K L (ρ ∥ ρ_{j}^{(l)}) + \frac{λ_{2}}{2} \sum_{l = 1}^{2 L} {‖{\tilde{W}}^{(l)}‖}_{F}^{2}

(6)

To update the parameters of $sDAE {\{{\tilde{W}}^{(l)}\}}_{l = 1}^{2 L}$ , the backpropagation algorithm with gradient descent method is applied. Then, the gradient of ${\tilde{W}}^{(l)}$ is given by

\nabla {\tilde{W}}^{(l)} = \frac{1}{N} \sum_{n = 1}^{N} (Δ H_{n}^{(l)} {\tilde{h}}_{n}^{{(l - 1)}^{T}} + I (L - l) Δ T_{n}^{(l)} + λ_{1} I (2 L - 1 - l) Δ S_{n}^{(l)} {\tilde{h}}_{n}^{{(l - 1)}^{T}}) + λ_{2} {\tilde{W}}^{(l)}

(7)

in which $Δ H_{n}^{(l)}$ is defined as

Δ H_{n}^{(l)} = \{\begin{array}{l} (h_{n}^{(l)} - x_{n}) ⊙ φ^{'} (z_{n}^{(l)}) & l = 2 L \\ (W^{{(l + 1)}^{T}} Δ H_{n}^{(l + 1)}) ⊙ φ^{'} (z_{n}^{(l)}) & l = 2 L - 1, \dots, 2, 1 \end{array}

where the operation $⊙$ denotes the element-wise multiplication. $I (\cdot)$ is an indicator function defined by

I (s) = \{\begin{array}{l} 1 & s \geq 0 \\ 0 & o t h e r w i s e \end{array}

$Δ T_{n}^{(l)} = Δ T_{n}^{(l)} (0) {\tilde{h}}_{n}^{{(l - 1)}^{T}} - \sum_{k = 1}^{K} Δ T_{n}^{(l)} (k) {\tilde{g}}_{k}^{{(l - 1)}^{T}}$ with $Δ T_{n}^{(l)} (0)$ and $Δ T_{n}^{(l)} (k)$ being defined as

Δ T_{n}^{(l)} (0) = \{\begin{array}{l} (h_{n}^{(l)} - D^{(l)} V_{n}) ⊙ φ^{'} (z_{n}^{(l)}) & l = L \\ (W^{{(l + 1)}^{T}} Δ T_{n}^{(l + 1)} (0)) ⊙ φ^{'} (z_{n}^{(l)}) & l = L - 1, \dots, 2, 1 \end{array}

Δ T_{n}^{(l)} (k) = \{\begin{array}{r} V_{n k} (h_{n}^{(l)} - D^{(l)} V_{n}) ⊙ φ^{'} (q_{k}^{(l)}) l = L \\ k = 1, 2, \dots, K \\ (W^{{(l + 1)}^{T}} Δ T_{n}^{(l + 1)} (k)) ⊙ φ^{'} (q_{k}^{(l)}) \\ l = L - 1, \dots, 2, 1 \\ k = 1, 2, \dots, K \end{array}

in which $V_{n}$ is the $n$ -th column of $V$ . $Δ S_{n}^{(l)} (t)$ is defined as

Δ S_{n}^{(l)} (t) = \{\begin{array}{l} R^{(l)} ⊙ φ^{'} (z_{n}^{(l)}) & \begin{array}{l} t = 2 L - l \\ l = 2 L - 1, \dots, 2, 1 \end{array} \\ (W^{{(l + 1)}^{T}} Δ S_{n}^{(l + 1)} (t)) ⊙ φ^{'} (z_{n}^{(l)}) & \begin{array}{r} \begin{array}{l} t = 1, 2, \dots, 2 L - 1 - l \\ l = 2 L - 2, \dots, 2, 1 \end{array} \end{array} \end{array}

where $R^{(l)}$ is a $r^{(l)}$ -dimensional column vector with $i$ -th element being $(\frac{- ρ}{ρ_{i}^{(l)}} + \frac{1 - ρ}{1 - ρ_{i}^{(l)}})$ , and $Δ S_{n}^{(l)} = \sum_{t = 1}^{2 L - l} Δ S_{n}^{(l)} (t)$ . The update formula for ${\tilde{W}}^{(l)}$ is

{\tilde{W}}^{(l)} = {\tilde{W}}^{(l)} - η_{1} \nabla {\tilde{W}}^{(l)}

where $η_{1}$ is the learning rate.

2). The Learning of Dictionary:

To update dictionary $D$ with fixed ${\{{\tilde{W}}^{(l)}\}}_{l = 1}^{2 L}$ and $V$ , problem (3) can be rewritten as

\begin{array}{l} min_{D} \frac{1}{2 N} (∥ X - D V ∥_{F}^{2} + {‖X^{(L)} - D^{(L)} V‖}_{F}^{2}) \\ s . t . {‖d_{k}‖}_{2}^{2} \leq 1, \forall k = 1, 2, \dots, K \end{array}

(8)

The gradient descent method is used to optimize the above problem and the gradient of $D$ is given by

\nabla D = \frac{1}{N} ((D V - X) V^{T} + Δ R)

(9)

in which the $k$ -th column of $Δ R$ is computed by $\sum_{n = 1}^{N} Δ R_{n}^{(1)} (k)$ and $Δ R_{n}^{(l)} (k)$ is defined as

Δ R_{n}^{(l)} (k) = \{\begin{array}{l} W^{{(l)}^{T}} (V_{n k} (D^{(l)} V_{n} - h_{n}^{(l)}) ⊙ φ^{'} (q_{k}^{(l)})) & \begin{array}{l} l = L \\ k = 1, 2, \dots, K \end{array} \\ W^{{(l)}^{T}} (Δ R_{n}^{(l + 1)} (k) ⊙ φ^{'} (q_{k}^{(l)})) & \begin{array}{r} \begin{array}{l} l = L - 1, \dots 2, 1 \\ k = 1, 2, \dots, K \end{array} \end{array} \end{array}

The update formula for $D$ is

D = D - η_{2} \nabla D

where $η_{2}$ is the learning rate. Considering the constraint on the dictionary, each column of the updated dictionary $D$ is normalized to unit length by

d_{k} = \frac{1}{{‖d_{k}‖}_{2}} d_{k}

(10)

3). Sparse Representations Learning:

With the fixed ${\{{\tilde{W}}^{(l)}\}}_{l = 1}^{2 L}$ and $D$ , the sparse representations can be obtained by solving the following optimization problem

min_{V} f (V) + g (V)

(11)

where $f (V)$ and $g (V)$ are

f (V) = \frac{1}{2 N} (∥ X - D V ∥_{F}^{2} + {‖X^{(L)} - D^{(L)} V‖}_{F}^{2}) + \frac{λ_{3}}{2} t r (V H V^{T}) + \frac{λ_{4}}{2} t r (V L V^{T}) + \frac{λ_{5}}{2} ∥ V ∥_{F}^{2}

g (V) = λ_{6} ∥ V ∥_{1}

To ensure the convexity of $f (V)$ , $λ_{5} > λ_{3} \geq 0$ holds and the details can be seen in Appendix I-B. In problem (11), $f (V)$ is convex and differentiable, while $g (V)$ is convex but nondifferentiable. Thus, the Fast Iterative Shrinkage Thresholding Algorithm (FISTA) [26] is adopted to optimize $V$ . The gradient of $f (V)$ with respect to $V$ is

\nabla V = \frac{1}{N} (D^{T} (D V - X) + D^{{(L)}^{T}} (D^{(L)} V - X^{(L)})) + V S

(12)

in which $S = λ_{3} H + λ_{4} L + λ_{5} I$ . The Lipschitz constant of the gradient $\nabla V$ is given by (13) in Appendix I-C. Besides, the soft thresholding function in FISTA is defined as $S T_{\frac{λ_{6}}{L_{f}}} (\cdot) = s i g n (\cdot) m a x \{0, |\cdot| - \frac{λ_{6}}{L_{f}}\}$ with $|\cdot|$ representing absolute value function. The total optimization process is described in Algorithm 1.

IV. Results and Analysis

In this section, EMDDL is utilized to explore the dynamic functional connectivity changes of brain during two tasks and resting state.

A. Data Acquisition and Preprocessing

PNC is a large scale collaborative project between the Brain Behaviour Laboratory at the University of Pennsylvania and the Children's Hospital of Philadelphia, which contains data collected using three fMRI paradigms from nearly 900 youth aged from 8 to 22 , i.e., two tasks including emotion identification (Emoid fMRI) and working memory (Nback fMRI), and resting-state (Rest fMRI) [27]. All fMRI scans were collected on a single 3T Siemens TIM Trio whole-body scanner using a single-shot, interleaved multi-slice, gradientecho, echo planar imaging sequence. The Emoid fMRI, Nback fMRI and Rest fMRI scan durations were 10.5 minutes (210 TR), 11.6 minutes (231 TR) and 6.2 minutes (124 TR) respectively. During Emoid task, subjects were asked to identify 60 faces with neutral, happy, sad, angry, or fearful expressions. During Nback task to probe working memory, subjects were required to respond only when a presented fractal was the same as the one presented in the previous trial. During the resting-state scan, subjects were instructed to stay awake, keep eyes open, fixate on the displayed crosshair, and remain still. Of these, 123 children and 146 young adults completed all three paradigms. By using Statistical Parametric Mapping 12, motion correction, co-registration, spatial normalization to standard Montreal Neurological Institute space (spatial resolution of $3 \times 3 \times 3 mm$ ), and spatial smoothing with a 3 mm full width half maximum Gaussian kernel were implemented. Then, a regression procedure was used to remove the influence of motion and the functional time series were band-pass filtered using a $0.01 Hz$ to $0.1 Hz$ frequency range. According to the Power coordinates with a sphere radius parameter of $5 mm$ [28], 264 regions of interest (ROIs) containing 21384 voxels were extracted. The details of the 264 ROIs are shown in Table 2 of Supplementary material. Every subject file can be reduced to a $264 \times T$ matrix by averaging the time series of all voxels in the same brain region, where the time point $T$ is 210, 231, and 124 for Emoid, Nback, and Rest fMRIs respectively.

IV.

We divided 264 ROIs into 13 functional networks to facilitate the understanding of functional connectivity relationships between the ROIs [28]. Among them, 12 functional networks including sensory/somatomotor network (SSN), cinguloopercular task control network (COTCN), auditory network (AN), default mode network (DMN), memory retrieval network (MRN), visual network (VN), frontoparietal task control network (FPTCN), salience network (SN), subcortical network (SCN), ventral attention network (VAN), dorsal attention network (DAN), and cerebellar network (CN), are mainly associated with the perception of movement, memory, language, vision, cognition and other functions of the brain, while there are 28 ROIs unrelated to any of the above functional networks which belong to the uncertain network (UN).

To capture the dynamic characteristics of the brain, dFC is obtained by calculating the Pearson correlation between the time-courses of the BOLD signals of pair regions within a window [29]–[31]. The details of obtaining the multimodal data can be seen in Appendix I-D. By grid search, we choose window length $w_{l}$ being 14, 17, and 33 for Emoid, Nback, and Rest fMRIs respectively, and scan length $s_{l}$ is 1 for all three modalities. Thus, each subject provides a dFC matrix $M_{d F C} \in ℝ^{C_{264}^{2} \times S_{l}}$ corresponding to Emoid, Nback, and Rest fMRIs, where $C_{264}^{2} = 34716$ . To reduce the complexity of computation, systematic sampling is used to select 20 sub-sequences from the dFC matrix corresponding to each modality of each subject [4]. Training data contains 80% of the subjects and the remaining subjects are test data.

B. Experimental Results

To evaluate the performance of the algorithm, the signal-to-noise ratio (SNR) [32] is used as evaluation index which is defined as

S N R = 10 l o g_{10} (\frac{∥ X ∥_{F}^{2}}{∥ X - D V ∥_{F}^{2}})

Given that the grid search method can simply make a complete search over a given hyperparameters space, easily be parallelized to find more stable optimal hyperparameter [33], [34], it is used to select appropriate hyperparameters. Specifically, one of the hyperparameters is selected by the grid search method when other hyperparameters are fixed. By repeating the above process, all hyperparameters are optimized, and the results are shown in Figure 1 of the Supplementary material. There are 7 layers of sDAE with 34716, 10000, 6000, 1000, 6000,10000, 34716 units respectively. The number of atoms $K$ is 300 , the sparsity parameter $ρ$ is 0.1 , the regularization coefficients $λ_{1}$ , $λ_{2}$ , $λ_{3}$ , $λ_{4}$ , $λ_{5}$ , and $λ_{6}$ are 0.0001, 0.0005, 0.0003, 0.0001, 0.0005, and 0.001 respectively, the $k$ nearest neighbor of hypergraph is 9. Because problem (6) and (8) are nonconvex, RMSProp algorithm is used to update ${\{{\tilde{W}}^{(l)}\}}_{l = 1}^{2 L}$ and $D$ due to the better generalization ability and it is less prone to overfitting [35]. For RMSProp algorithm, the learning rates $η_{1}$ and $η_{2}$ are 0.00005 and 0.00008 respectively, and the square gradient decay rates $ξ_{1}$ and $ξ_{2}$ are both 0.9.

Based on the optimal hyperparameters, we apply EMDDL to the training data to obtain the dictionary and sparse representations. The learning curves of loss functions and the SNR evaluation on both training data and testing data are shown in Fig. 2. The results testify that the sparse representations can characterize the local geometric relationships between data and dictionary in the two spaces. The SNR of EMDDL, multimodal dictionary learning (MDL) [36] and sparse deep dictionary learning (SDDL) [4] on testing data are shown in Table I. It shows that the multimodal-based methods have better reconstruction ability compared with the single modality methods, and the generalization ability in reconstruction of EMDDL are better than the other two methods. It testifies that integrating the multimodal prior or combining the complementary information from diverse modalities can promote model enhancement.

Fig. 2: — The learning curves of loss functions and the SNR evaluation on both training data and testing data of EMDDL. The curve is formed by the average of 10 repetitions, and the gray shadow is formed by the standard deviation of 10 repetitions.

TABLE I:

The SNR on testing data of various methods.

		Multimodal based methods		Single modality based methods

Paradigm	Method	EMDDL	MDL	SDDL
Paradigm	SNR	EMDDL	MDL	SDDL
Emoid		3.2577	2.6416	0.9338
Nback		3.8650	3.2029	1.2219
Rest		4.8046	4.1108	1.0622

Open in a new tab

C. States Analysis of Multimodal Data

To find the differences in reoccurring patterns of dFC (i.e., states) between children and young adults, $k$ means clustering method with the cityblock distance metric is used to obtain the reoccurring patterns of each group in each modality [37]. Specifically, sparse representations of each group in each modality are clustered, and then states can be obtained by multiplying the dictionary and the cluster centroid. We use the elbow criterion defined as within-cluster sums of distances to estimate the optimal number of dFC states, and the optimal number of dFC states for Emoid, Nback, and Rest fMRIs are 5, 5, and 4 respectively. To test whether the clustering results are consistent in multiple subgroups, we use the kappa coefficient as the indicator [38], and the details can be seen in Supplementary material. The results indicate that the clustering results obtained from two different subgroups are substantial agreement or perfect agreement in a large probability. For Emoid task, the proportions of each state for children are 14.07%, 22.28%, 17.52%, 25.53%, and 20.61% respectively, while the proportions of each state for young adults are 9.08%, 21.06%, 14.55%, 27.29%, and 28.01% respectively. For Nback task, the proportions in these states for children are 11.14%, 19.19%, 24.23%, 22.97%, and 22.48% respectively, while the proportions in these states for young adults are 7.5%, 14.28%, 22.53%, 25.58%, and 30.1% respectively. For Rest fMRI, the proportions of these states for children are 31.14%, 29.35%, 34.15%, and 5.37% respectively, while the proportions of these states for young adults are 21.64%, 13.87%, 27.43%, and 37.05% respectively.

To further investigate the time occupied divergence of each state, dwell time (DT) and fraction of time (FT) are estimated from the state transition vector [7]. DT represents how long an individual spends in a given state on average, and FT is to describe the total time spent in a given state. For a subject $i$ , DT and FT of $k$ -th state are defined by

D T^{state (k)} = m e a n (T R_{e n d} - T R_{s t a r t})

F T^{s t a t e (k)} = \frac{s u m (s t a t e_v e c t o r_{(i)} = = k)}{T o t a l n u m b e r o f w i n d o w}

where $T R_{s t a r t}$ and $T R_{e n d}$ are computed by

T R_{s t a r t} = c o u n t (d i f f e r e n c e (s t a t e_v e c t o r_{(i)}, k) = = 1)

T R_{e n d} = c o u n t (d i f f e r e n c e (s t a t e_v e c t o r_{(i)}, k) = = - 1)

in which "1" and "−1" mean that the specific window of $i$ -th subject belongs to a certain state $k$ or not; $s t a t e_v e c t o r_{(i)}$ is the states of the $i$ -th subject in all window. The reoccurring patterns and time occupied divergence of Emoid, Nback, and Rest fMRIs for children and young adults are shown in Figures 3–5, where *, **, ***, **** and ***** denote the significant level 0.05, 0.01, 0.001, 0.0001, and 0.00001 respectively. In Figures 3–5 B, the dots represent DT values or FT values in each state of all subjects, and the blocks represent the box plot (i.e., the central mark indicates the median, and the bottom and top edges of the box indicate the 25 th and 75 th percentiles, respectively) of DT values or FT values in each state of all subjects. Moreover, we visualize the top 100 significant functional connectivity related to age in each state (i.e., the functional connectivity corresponding to the 100 smallest FDR-corrected $p$ -values of two-sample $t$ -test performed across subjecťs mean dFC by state) under Emoid, Nback, and Rest, which are shown in Figures 2–4 of the Supplementary material.

Fig. 3: — A. The states for the child groups (top) and young adult groups (bottom) during Emoid task. B. (a) and (b) are the values as well as box plot of DT and FT in each state for children and young adults during Emoid task respectively. (c) and (d) are the mean of DT and FT of all subjects for children and young adults in each state during Emoid task respectively.

Fig. 5: — A. The states for the child groups (top) and young adult groups (bottom) during rest. B. (a) and (b) are the values as well as box plot of DT and FT in each state for children and young adults during rest respectively. (c) and (d) are the mean of DT and FT of all subjects for children and young adults in each state during rest respectively.

To study the changes in reoccurring patterns over time under two tasks and resting state, we define the transition probabilities $P_{i j}$ from time $t$ to time $t + 1$ as follows

P_{i j} = \frac{s u m {\{I_{(S_{t}^{n} = i, S_{t + 1}^{n} = j)} = = 1\}}_{n = 1}^{N}}{s u m {\{I_{(S_{t}^{n} = i)} = = 1\}}_{n = 1}^{N}} i, j = 1, 2, \dots, s

where $S^{n} = (S_{1}^{n}, S_{2}^{n}, \dots, S_{T}^{n}) \in ℝ^{1 \times T}$ is the state vector for $n$ -th subject, and $S_{t}^{n} = i$ for $i = 1, 2, \dots, s$ ( $s$ is 5,5 , and 4 for Emoid, Nback, and Rest respectively) represents that the $n$ -th subject is in state $i$ at time $t$ . $I_{(\cdot)}$ is an indicative function, which is 1 when the condition is true, otherwise it is 0 . The probability of each state at the initial time is defined as

P_{i} = \frac{s u m {\{I_{(S_{1}^{n} = i)} = = 1\}}_{n = 1}^{N}}{N} i = 1, 2, \dots, s

Specifically, for a given state $i_{t}$ at time $t$ , we can calculate the transition probabilities $P_{i_{t} j}$ for $j = 1, 2, \dots, s$ from time $t$ to time $t + 1$ . Then we record the maximum transition probability and the corresponding state at time $t + 1$ , and denote them as $P_{i_{t} i_{t + 1}}$ and $i_{t + 1}$ respectively. By repeating the above steps, we can obtain the state transition curve with maximum state transition probability, which is shown in Figures 5–7 A of the Supplementary material. To further explore how the strength of functional connectivity changes over time, we count the proportion of enhancement and decrease of functional connections within or between functional networks during state transition, which is shown in Figures 5–7 B of the Supplementary material. To contrast the functional connectivity matrices between two adjacent states at the state transition point, we visualized the differences of the functional connectivity matrices between two adjacent states at the state transition point for each group under Emoid task and Nback task, which are shown in Figure 8 of the Supplementary material.

V. Discussion

A. Developmental Differences

1). The Common Developmental Differences of Three fMRI Paradigms:

Figures 3–5 A show the reoccurring patterns of three paradigms for both children and young adults. For the child group, we found that states 1, 2, and 3 in the resting state are similar to the Emoid states 2, 3, and 4 (Pearson correlation coefficient is 0.9687, 0.9631, and 0.9631 respectively) and the Nback states 5, 2, and 3 (Pearson correlation coefficient is 0.9744, 0.9856, and 0.9602 respectively). The analogous conclusions also can be found in the young adult group, where all reoccurring patterns in resting state are similar to Emoid states 2, 3, 4, and 5 (Pearson correlation coefficient is 0.9379, 0.9561, 0.9191 and 0.9588 respectively) and Nback states 3, 2, 5, and 4 (Pearson correlation coefficient is 0.9552, 0.9612, 0.9566, and 0.9696 respectively). It indicates that the reoccurring patterns of three paradigms are similar for a subject. The same conclusion also can be found in previous research [39], which reveals that no matter in resting state or task, the basic structure of the brain functional network remains relatively consistent. The finding testifies that the brain has a shared functional architecture during resting and many directed tasks, and the shared functional architecture of the brain can only modulate the connectivity pattern in response to task demands. In other words, the overlapping functional connectivity patterns between Rest fMRI and two task fMRIs suggest a shared functional architecture underlying and even shaping brain function, and a potential explanation of overlap is that the functional connectivity during resting constrains the activation of brain regions in response to task demands [40].

Although the brain shares the basic functional architecture during task and resting state, the basic functional organization between children and young adults are different. The number of within or between functional networks that children exist high-strength functional connections is 43 in state 1 and 2 in state 3 under Emoid task, 55 in state 1 and 10 in state 2 under Nback task, and 13 in state 2 under resting state. The number of within or between functional networks that young adults exist high-strength functional connections is 9 in state 1 under Emoid task, 20 in state 1 under Nback task, and 2 in state 2 under resting state. For all three fMRI paradigms, we found that children have many high-strength functional connections distributed widely among 13 functional networks, young adults have high-strength functional connections only within and between some functional networks. It is consistent with the previous studies that children show more diffuse functional connectivity patterns while young adults show more focused functional connectivity patterns, and the changes in functional connectivity patterns between children and young adults explain how brain function changes from an undifferentiated system to a specialized system as one grows up [3], [4]. The brain organization of distinct and stronger within-network communication can promote precise modulation efficiently because it can transfer more information in a short time [41]. Thus, compared with children with more diffuse functional connectivity patterns, the brain organization of young adults with more focused functional connectivity patterns can transmit information more efficiently and facilitate precise modulation during resting and two tasks.

Additionally, the functional connectivity among DMN, SC<, MRN, CN, AN, FPTCN, and SN is decreased in most reoccurring patterns for Emoid, Nback, and Rest fMRIs during development. DMN, so-called task-negative network, is broadly inactivated across tasks, which are closely related to numer- ous key brain functions such as integration of autobiographic information, self-monitoring, and social cognition [28]. It is reported that the functional activity in DMN never stops but regulates during the resting state [42]. SCN participates in memory, attention, perception, and consciousness, and dominates the motivation and emotion state independent control of cortical functions [43]. MRN is reported to be engaged during autobiographical memory retrieval that involves strategic search processes guided by self-knowledge and current goals, memory recovery associated with a rich sense of reexperience, monitoring, and other control processes [44]. CN is not just considered as the domain of motor control that receives information from widespread regions to affect the generation and control of movement, but also is thought to be involved in cognition and visuospatial reasoning [45]. AN innervated by autonomic nerves, involves activities related to sound information including collection, conduction, and processing [46], [47]. FPTCN involving working memory maintenance, predictive perceptual coding, and cognitive task, is thought to play an important part in mediating the allocation of attentional resources to compete for auditory information under varying degrees of perceptual demand [48]. SN is thought to regulate attention and behavior adaptively through the physical characteristics and the relevant information of the task, and also is considered to be a key interface for cognitive, homeostatic, motivational, and affective systems [49]. Both resting and task fMRIs suggest that the functions of the brain in processing information, working memory, and cognition are not mature in children compared with young adults [3].

The functional connectivity between SSN, COTCN, DAN, and some other functional networks is increased in some reoccurring patterns for resting and task fMRIs during development. SSN participates in the process of emotional feeling and cognitive activities [50]. COTCN is the key to coordinate information transmission and involves many complex cognitive tasks [51]. DAN controls external and attentiondemanding cognitive functions [52]. Three fMRI paradigms indicate that brain functions related to emotional feelings, cognition, and information transmission are still growing with age.

2). The Developmental Differences of Each fMRI Paradigm:

Figures 3–5 B show the time occupied divergence of children and young adults during task and resting state. Both children and young adults have lower DT and FT in each state for two tasks while having higher DT and FT in each state during rest. It indicates that subjects including children and young adults tend to switch frequently among states in tasks and prefer to stay in a particular state while resting. It reveals that the spontaneous functional activity is stable during resting state, and then the functional activity corresponding to task demands changes quickly when the participant is required to perform a task [53].

For Emoid fMRI, both children and young adults stay in states 2, 3, and 4 for about the same time, but children stay longer in state 1 while young adults stay longer in state 5 . Under the Emoid task, whether the initial state is 2, 3, 4, or 5, the children group will eventually switch to state 4 at time 9, and then they will switch back and forth between state 2 and state 4 . When the initial state is 1 , children group will stay in state 1 for the most time and then switch to state 3. No matter which the initial state is, the young adult group will eventually switch to state 5 at time 5 and stay at state 5 for a long while, and then they will switch to state 4 at time 18 . The result of the Emoid task indicates that children have more frequent state transitions between state 2 and state 4 , and the strength of functional connections within or between functional networks changes over time. Compared with children, the strength of functional connectivity within or between functional networks decreases at the early stage for young adults, and then they prefer to stay in state 5 .

For Nback fMRI, both children and young adults stay in states 2, 3, and 4 for about the same time, but children stay longer in state 1 while young adults stay longer in state 5 . Under the Nback task, no matter which the initial state is, the children group will eventually switch to state 4 at time 9, and then they will stay at state 4 until they switch to state 3 at time 18. The young adult group switch between state 4 and state 5 after time 7 in any initial state. The result of the Nback task indicates that the strength of functional connectivity for children changes over time during the frequent state transition at an early time, and then they will stay at state 4 for a while and finally switch to state 3. Unlike children, young adults prefer to stay for a while after switching to state 4 or state 5 , and the strength of functional connectivity within or between functional networks decreases first, then increases, and then decreases during state transition between state 4 and state 5.

For Rest fMRI, both children and young adults stay in state 3 for about the same time. Children stay longer in state 1 and state 2, whereas young adults prefer to stay in state 4 . Under the resting state, both children and young adults prefer to stay in a specified state with no change in the strength of functional connectivity within or between functional networks. We found that children prefer to switch among states with diffuse functional connectivity patterns during the two tasks and stay in states with diffuse functional connectivity patterns during rest. On the other hand, young adults switch among states with focused functional connectivity patterns in two task fMRIs and stay in states with focused functional connectivity patterns during rest.

For Emoid fMRI, along with the enhanced functional connectivity among SSN, COTCN, and DAN with age in states 4 and 5, the functional connectivity in the rest states declines to various degrees. For Nback fMRI, there exists enhanced functional connections within and between 13 functional networks in state 3 during development. Also, the functional connectivity decreases in the rest states with age. For Rest fMRI, in states 1,2, and 4, there are not only lessened functional connections which mainly exist among SCN, MRN, CN, DMN, AN, FPTCN, and SN, but also exist strengthen functional connections which are mainly among SSN, COTCN, and DAN. In state 3 of Rest fMRI, the functional connections within and between 13 functional networks enhance during development. We found that compared with children, the functional connectivity of young adults increases or reduces with time for resting fMRI while generally decreasing for the two tasks. It indicates that the changes of functional connectivity with age are more complex in resting, and the brain functions related to emotion and working memory are more mature and efficient during development [4], [41].

B. Future Work

In this paper, the functional connections of three fMRI paradigms were used for learning the modality-shared dictionary and modality-specific sparse representations. An interesting note is that the dynamic functional connections of multiple modalities from multiple subjects can be treated as a higher-order tensor by considering time dimension, subject dimension, or modality dimension. Higher-order tensor can maintain the structure relationship of different dimensions of data, which may be lost in a low dimensional form. Besides, by considering the brain regions as nodes of a graph, the functional connections can be viewed as the weight matrix of edges for a graph. Compared with regarding the functional connections as a feature vector, considering the functional connections as a graph feature can help to discover the structural relationships among brain regions. Hence, a meaningful future work is to incorporate the structural information of different dimensions of the data or the graph structural information of the brain into EMDDL, which may help to improve the model performance and capture the intrinsic topological structure of the brain.

VI. Conclusion

In this paper, we present an explainable multimodal deep dictionary learning method to capture the developmental differences between children and young adults from three fMRI paradigms. Specifically, the shared dictionary and the modality-specific sparse representations are learned based on the multimodal data and their encodings of the sDAE to simultaneously reveal the commonality and specificity of different paradigms. By applying the proposed method to the three fMRI paradigms from PNC, we found that children share a diffuse functional connectivity pattern while young adults share a focused functional connectivity pattern during both resting and two tasks. Three fMRI paradigms reveal that compared with children, young adults possess more mature and efficient functional networks for processing information. Children and young adults rarely transit from one state to other states during resting and prefer to switch among states over time during a task.

Supplementary Material

supp1-3244921

NIHMS1918682-supplement-supp1-3244921.pdf^{(4.2MB, pdf)}

Fig. 4: — A. The states for the child groups (top) and young adult groups (bottom) during Nback task. B. (a) and (b) are the values as well as box plot of DT and FT in each state for children and young adults during Nback task respectively. (c) and (d) are the mean of DT and FT of all subjects for children and young adults in each state during Nback task respectively.

Acknowledgments

This research was supported by the National Natural Science Foundation of China (No. 12090021 and No. 12271429), the National Key Research and Development Program of China (No. 2020AAA0106302) the Natural Science Basic Research Program of Shaanxi (No. 2022JM005), and was partly supported by the National Institutes of Health (R01 MH104680, R01 GM109068, R01 MH121101, R01 MH116782, R01 MH118013 and P20-GM144641) and the HPC Platform, Xi'an Jiaotong University. (Corresponding author: Chen Qiao and Yuping Wang)

Appendix 1

A. The Comparison of Sparsity of Activations Among $L_{1}$ -norm, $L_{2}$ -norm, and Kullback-Leibler Divergence

Let $ρ$ be a small positive constant between 0 and 1 , and $ρ_{j}^{(l)} = \frac{1}{N} \sum_{n = 1}^{N} h_{n j}^{(l)}$ with $h_{n j}^{(l)}$ being the $j$ -th element of $h_{n}^{(l)}$ is the average activation of neural $j$ in the $l$ -th layer, then the Kullback-Leibler divergence is defined as

K L (ρ ∥ ρ_{j}^{(l)}) = ρ log \frac{ρ}{ρ_{j}^{(l)}} + (1 - ρ) log \frac{1 - ρ}{1 - ρ_{j}^{(l)}}

The penalty functions based on $L_{1}$ -norm and $L_{2}$ -norm are defined as

f_{L_{1}} = \sum_{l = 1}^{2 L - 1} \sum_{j = 1}^{r^{(l)}} |ρ_{j}^{(l)}|

f_{L_{2}} = \sum_{l = 1}^{2 L - 1} \sum_{j = 1}^{r^{(l)}} {(ρ_{j}^{(l)})}^{2}

The sparsity and SNR with $L_{1}$ -norm, $L_{2}$ -norm and KullbackLeibler divergence have been compared and the results are shown in Figure 9 of the Supplementary material. The results show that the sparsity of Kullback-Leibler divergence is better than that of $L_{1}$ -norm in most hidden layers, and the sparsity of $L_{2}$ -norm is the worst among the above three penalty functions. The SNR evaluation of EMDDL on the multimodal data in both the original space and the encoding space show that, EMDDL based on Kullback-Leibler divergence has better reconstruction ability compared with $L_{1}$ -norm and $L_{2}$ -norm.

B. The Proof of the Convexity of $f (V)$

The convexity of $f (V)$ depends on whether its Hessian matrix $\nabla^{2} f (V)$ is positive definite or not. Thus, as long as $\nabla^{2} f (V)$ is positive definite, the convexity of $f (V)$ can be guaranteed. The Hessian matrix $\nabla^{2} f (V)$ is

\nabla^{2} f (V) = \frac{1}{N} (D^{T} D + D^{{(L)}^{T}} D^{(L)}) + S

According to the Weyl's inequality [54], we have

\begin{array}{l} λ_{m i n} (\nabla^{2} f (V)) = λ_{m i n} (\frac{1}{N} (D^{T} D + D^{{(L)}^{T}} D^{(L)}) + S) \\ \geq λ_{m i n} (\frac{1}{N} D^{T} D) + λ_{m i n} (\frac{1}{N} D^{{(L)}^{T}} D^{(L)}) \\ + λ_{m i n} (λ_{3} H_{2}) + λ_{m i n} (- 2 λ_{3} H_{1}) \\ + λ_{m i n} (λ_{4} L) + λ_{m i n} ((λ_{3} + λ_{5}) I) \\ = (λ_{3} + λ_{5}) - 2 λ_{3} \\ = λ_{5} - λ_{3} \end{array}

Where $λ_{m i n} (\cdot)$ denotes the smallest eigenvalue of a matrix. To ensure the positive definite of the Hessian matrix $\nabla^{2} f (V)$ , $λ_{m i n} (\nabla^{2} f (V))$ should be greater than 0 . Thus, $f (V)$ is convex when $λ_{3} < λ_{5}$ holds.

C. The Lipschitz Constant of the Gradient $\nabla V$

For every $V^{1}, V^{2} \in ℝ^{K \times N}$ , we have

{‖\nabla V^{1} - \nabla V^{2}‖}_{2} = ∥ \frac{1}{N} (D^{T} D + D^{{(L)}^{T}} D^{(L)}) (V^{1} - V^{2}) + (V^{1} - V^{2}) S ∥_{2} \leq {‖\frac{1}{N} (D^{T} D + D^{{(L)}^{T}} D^{(L)})‖}_{2} {‖V^{1} - V^{2}‖}_{2} + {‖V^{1} - V^{2}‖}_{2} ∥ S ∥_{2} \leq (\frac{1}{N} ({‖D^{T} D‖}_{2} + {‖D^{{(L)}^{T}} D^{(L)}‖}_{2}) + ∥ S ∥_{2}) {‖V^{1} - V^{2}‖}_{2} = (\frac{λ_{m a x} (D^{T} D) + λ_{m a x} (D^{{(L)}^{T}} D^{(L)})}{N} + \sqrt{λ_{m a x} (S^{T} S)}) {‖V^{1} - V^{2}‖}_{2}

Thus, the Lipschitz constant of the gradient $\nabla V$ is

L_{f} = \frac{1}{N} (λ_{m a x} (D^{T} D) + λ_{m a x} (D^{{(L)}^{T}} D^{(L)})) + \sqrt{λ_{m a x} (S^{T} S)}

(13)

where $λ_{m a x} (\cdot)$ denotes the largest eigenvalue of a matrix.

D. The Details of Obtaining Multimodal Data

There are 264 BOLD signals with $T_{m}$ time points for the $m$ -th modality of the $n$ -th subject. $f c_{n k}^{(m)} (i, j)$ , the functional connectivity between the $i$ -th ROI and the $j$ -th ROI within the $k$ -th window for the $m$ -th modality of the $n$ -th subject, is calculated based on the Pearson correlation coefficient, which is defined as follows

\frac{f c_{n k}^{(m)} (i, j) = \sum_{t = 1}^{w_{l}} (B_{n k}^{(m)} (i, t) - {\bar{B}}_{n k}^{(m)} (i)) (B_{n k}^{(m)} (j, t) - {\bar{B}}_{n k}^{(m)} (j))}{\sqrt{\sum_{t = 1}^{w_{l}} {(B_{n k}^{(m)} (i, t) - {\bar{B}}_{n k}^{(m)} (i))}^{2}} \sqrt{\sum_{t = 1}^{w_{l}} {(B_{n k}^{(m)} (j, t) - {\bar{B}}_{n k}^{(m)} (j))}^{2}}}

where $B_{n k}^{(m)} (i, t)$ is the $t$ -th BOLD signal value of the $i$ -th ROI within the $k$ -th window for the $m$ -th modality of the $n$ -th subject. ${\bar{B}}_{n k}^{(m)} (i) = \frac{1}{w_{l}} \sum_{t = 1}^{w_{l}} B_{n k}^{(m)} (i, t)$ is the sample mean of the BOLD signals of the $i$ -th ROI within the $k$ th window for the $m$ -th modality of the $n$ -th subject. By calculating the functional connectivity between any two ROIs within the $k$ -th window for the $m$ -th modality of the $n$ -th subject, $C_{264}^{2} = 34716$ functional connections can be obtained within the $k$ -th window for the $m$ -th modality of the $n$ -th subject. For a BOLD signals with $T$ time points, we can obtain $S_{l} = \frac{T - w_{l}}{s_{l}} + 1$ windows with window length $w_{l}$ and scan length $s_{l}$ . Thus, a dynamic functional connection matrix $f c_{n}^{(m)} \in ℝ^{34716 \times S_{l}}$ can be obtained for the $m$ -th modality of the $n$ -th subject. Let $X_{(m)} = (f c_{1}^{(m)}, f c_{2}^{(m)}, \dots, f c_{N_{s}}^{(m)}) \in ℝ^{p \times N_{m}}$ be the data of the $m$ -th modality, and multimodal data $X = (X_{(1)}, X_{(2)}, \dots, X_{(M)}) \in ℝ^{p \times N}$ is composed of $M$ modalities. In which, $p = C_{264}^{2} = 34716$ , $N_{m} = S_{l} \times N_{s}$ with $N_{s}$ being the number of subjects, and $N = \sum_{m = 1}^{M} N_{m}$ . The flowchart of calculating $f c_{n}^{(m)}$ is shown in Figure 10 of the Supplementary material.

Contributor Information

Lan Yang, School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an 710049 P.R. China.

Chen Qiao, School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an 710049 P.R. China.

Huiyu Zhou, School of Computing and Mathematical Sciences, University of Leicester, United Kingdom.

Vince D. Calhoun, Tri-Institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, GA 30030

Julia M. Stephen, Mind Research Network, Albuquerque, NM 87106

Tony W. Wilson, Institute for Human Neuroscience, Boys Town National Research Hospital, Boys Town, NE 68010

Yuping Wang, Department of Biomedical Engineering, Tulane University, New Orleans, LA 70118 USA.

REFERENCES

[1].Edde M et al. “Functional brain connectivity changes across the human life span: From fetal development to old age,” Journal of Neuroscience Research, vol. 99, no. 1, pp. 236–262, 2021. [DOI] [PubMed] [Google Scholar]
[2].Hu W et al. “Deep collaborative learning with application to the study of multimodal brain development,” IEEE Transactions on Biomedical Engineering, vol. 66, no. 12, pp. 3346–3359, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
[3].Jolles DD et al. “A comprehensive study of whole-brain functional connectivity in children and young adults,” Cerebral Cortex, vol. 21, no. 2, pp. 385–391, 062010. [DOI] [PubMed] [Google Scholar]
[4].Qiao C et al. “Sparse deep dictionary learning identifies differences of time-varying functional connectivity in brain neuro-developmental study,” Neural Networks, vol. 135, pp. 91–104, 2021. [DOI] [PubMed] [Google Scholar]
[5].Cai B et al. “Capturing dynamic connectivity from resting state fmri using time-varying graphical lasso,” IEEE Transactions on Biomedical Engineering, vol. 66, no. 7, pp. 1852–1862, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
[6].Sun Y et al. “Brain state-dependent dynamic functional connectivity patterns in attention-deficit/hyperactivity disorder,' Journal of Psychiatric Research, vol. 138, pp. 569–575, 2021. [DOI] [PubMed] [Google Scholar]
[7].Cai B et al. “Estimation of dynamic sparse connectivity patterns from resting state fmri,” IEEE Transactions on Medical Imaging, vol. 37, no. 5, pp. 1224–1234, 2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
[8].Choe AS et al. “Comparing test-retest reliability of dynamic functional connectivity methods,” NeuroImage, vol. 158, pp. 155–175, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
[9].Rahim M et al. “Integrating multimodal priors in predictive models for the functional characterization of alzheimer's disease,” in Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015, Navab N, Hornegger J, Wells WM, and Frangi A, Eds. Cham: Springer International Publishing, 2015, pp. 207–214. [Google Scholar]
[10].Li Q et al. “Classification of alzheimer's disease, mild cognitive impairment, and cognitively unimpaired individuals using multi-feature kernel discriminant dictionary learning,” Frontiers in Computational Neuroscience, vol. 11, pp. 1–14, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].Li Q et al. “Multi-modal discriminative dictionary learning for alzheimer's disease and mild cognitive impairment,” Computer Methods and Programs in Biomedicine, vol. 150, pp. 1–8, 2017. [DOI] [PubMed] [Google Scholar]
[12].Xiao L et al. “A manifold regularized multi-task learning model for iq prediction from two fmri paradigms,” IEEE Transactions on Biomedical Engineering, vol. 67, no. 3, pp. 796–806, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[13].Liu F et al. “Inter-modality relationship constrained multi-modality multi-task feature selection for alzheimer's disease and mild cognitive impairment identification,” NeuroImage, vol. 84, pp. 466–475, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[14].Hu D et al. “Disentangled-multimodal adversarial autoencoder: Application to infant age prediction with incomplete multimodal neuroimages,” IEEE Transactions on Medical Imaging, vol. 39, no. 12, pp. 41374149, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[15].Abreu R et al. “Identification of epileptic brain states by dynamic functional connectivity analysis of simultaneous EEG-fMRI: a dictionary learning approach,” Scientific Reports, vol. 9, no. 1, p. 638, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
[16].Li H et al. “Multi-modality low-rank learning fused first-order and second-order information for computer-aided diagnosis of schizophrenia,” in Intelligence Science and Big Data Engineering. Big Data and Machine Learning, Cui Z, Pan J, Zhang S, Xiao L, and Yang J, Eds. Cham: Springer International Publishing, 2019, pp. 356–368. [Google Scholar]
[17].D'Souza N et al. “Deep sr-ddl: Deep structurally regularized dynamic dictionary learning to integrate multimodal and dynamic functional connectomics data for multidimensional clinical characterizations,” NeuroImage, vol. 241, p. 118388, 2021 [DOI] [PMC free article] [PubMed] [Google Scholar]
[18].Shao W et al. “Hypergraph based multi-task feature selection for multimodal classification of alzheimer's disease,” Computerized Medical Imaging and Graphics, vol. 80, p. 101663, 2020. [DOI] [PubMed] [Google Scholar]
[19].Di D et al. “Hypergraph learning for identification of covid-19 with ct imaging,” Medical Image Analysis, vol. 68, p. 101910, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
[20].Wang C et al. “High-level attributes modeling for indoor scenes classification,” Neurocomputing, vol. 121, pp. 337–343, 2013, advances in Artificial Neural Networks and Machine Learning. [Google Scholar]
[21].Schlkopf B et al. Learning with Hypergraphs: Clustering, Classification, and Embedding MIT Press, 2007, pp. 1601–1608. [Google Scholar]
[22].Jin R et al. “Dictionary learning-based fmri data analysis for capturing common and individual neural activation maps,” IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 6, pp. 1265–1279, 2020. [Google Scholar]
[23].Zhang Z et al. “Trace ratio optimization-based semi-supervised nonlinear dimensionality reduction for marginal manifold visualization,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 5, pp. 1148–1161, 2013. [Google Scholar]
[24].Yang M et al. “Fisher discrimination dictionary learning for sparse representation,” in 2011. International Conference on Computer Vision, 2011, pp. 543–550.
[25].Zhou Y and Barner KE, “Locality constrained dictionary learning for nonlinear dimensionality reduction,” IEEE Signal Processing Letters, vol. 20 , no. 4, pp. 335–338, 2013 [Google Scholar]
[26].Beck A and Teboulle M, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM J. Img. Sci, vol. 2, no. 1, pp. 183–202, Mar. 2009. [Google Scholar]
[27].Satterthwaite TD et al. “Neuroimaging of the philadelphia neurodevelopmental cohort,” NeuroImage, vol. 86, pp. 544–553, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[28].Power JD et al. “Functional network organization of the human brain,” Neuron, vol. 72, no. 4, pp. 665–678, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
[29].Shakil S et al. “Evaluation of sliding window correlation performance for characterizing dynamic functional connectivity and brain states,” NeuroImage, vol. 133, pp. 111–128, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
[30].Sakolu Ü et al. “A method for evaluating dynamic functional network connectivity and task-modulation: application to schizophrenia,” Magnetic Resonance Materials in Physics, Biology and Medicine, vol. 23, no. 5, pp. 351–366, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[31].Calhoun V et al. “The chronnectome: Time-varying connectivity networks as the next frontier in fmri data discovery,” Neuron, vol. 84, no. 2, pp. 262–274, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[32].Engan K et al. “Family of iterative 1s-based dictionary learning algorithms, ils-dla, for sparse signal representation,” Digital Signal Processing, vol. 17, no. 1, pp. 32–49, 2007. [Google Scholar]
[33].Abas M et al. “Agarwood oil quality classification using support vector classifier and grid search cross validation hyperparameter tuning,” International Journal of Emerging Trends in Engineering Research, vol. 8, no. 6, pp. 2551–2556, 2020. [Google Scholar]
[34].Saud S et al. “Performance improvement of empirical models for estimation of global solar radiation in india: A k-fold cross-validation approach,” Sustainable Energy Technologies and Assessments, vol. 40, p. 100768, 2020. [Google Scholar]
[35].Xu D et al. “Convergence of the rmsprop deep learning method with penalty for nonconvex optimization,” Neural Networks, vol. 139, pp. 17–23, 2021. [DOI] [PubMed] [Google Scholar]
[36].Bahrampour S et al. “Multimodal task-driven dictionary learning for image classification,” IEEE Transactions on Image Processing, vol. 25, no. 1, pp. 24–38, 2016. [DOI] [PubMed] [Google Scholar]
[37].Kanungo T et al. “An efficient k-means clustering algorithm: analysis and implementation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 881–892, 2002. [Google Scholar]
[38].Cohen J, “A coefficient of agreement for nominal scales,” Educational & Psychological Measurement, vol. 20, pp. 37–46, 1960. [Google Scholar]
[39].Cole M et al. “Intrinsic and task-evoked network architectures of the human brain,” Neuron, vol. 83, no. 1, pp. 238–251, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[40].Hughes C et al. “Aging relates to a disproportionately weaker functional architecture of brain networks during rest and task states,” NeuroImage, vol. 209, p. 116521, 2020. [DOI] [PubMed] [Google Scholar]
[41].Bullmore E and Sporns O, “The economy of brain network organization,” Nature Reviews Neuroscience, vol. 13, no. 5, pp. 336–349, 2012. [DOI] [PubMed] [Google Scholar]
[42].Spreng RN et al. “The common neural basis of autobiographical memory, prospection, navigation, theory of mind, and the default mode: A quantitative meta-analysis,” J Cogn Neurosci, vol. 21, no. 3, pp. 489510, 2009. [DOI] [PubMed] [Google Scholar]
[43].Kang J et al. “Energy landscape analysis of the subcortical brain network unravels system properties beneath resting state dynamics,” NeuroImage, vol. 149, pp. 153–164, 2017. [DOI] [PubMed] [Google Scholar]
[44].St. Jacques PL et al. “Dynamic neural networks supporting memory retrieval,” NeuroImage, vol. 57, no. 2, pp. 608–616, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
[45].Bostan AC et al. “Cerebellar networks with the cerebral cortex and basal ganglia,” Trends in Cognitive Sciences, vol. 17, no. 5, pp. 241–254, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
[46].Smith SM et al. “Correspondence of the brain's functional architecture during activation and rest,” Proceedings of the National Academy of Sciences, vol. 106, no. 31, pp. 13040–13045, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
[47].Leaver AM et al. “Dysregulation of limbic and auditory networks in tinnitus,” Neuron, vol. 69, no. 1, pp. 33–43, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
[48].Pillay S et al. “Perceptual demand and distraction interactions mediated by task-control networks,” NeuroImage, vol. 138, pp. 141–146, 2016. [DOI] [PubMed] [Google Scholar]
[49].Mesnon V, “Salience network,” in Brain Mapping, Toga AW, Ed. Waltham: Academic Press, 2015, pp. 597–611. [Google Scholar]
[50].Londei A et al. “Sensory-motor brain network connectivity for speech comprehension,” Human Brain Mapping, vol. 31, no. 4, pp. 567–580, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[51].Sheffield JM et al. “Fronto-parietal and cingulo-opercular network integrity and cognition in health and schizophrenia,” Neuropsychologia, vol. 73 , pp. 82–93, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[52].Gao W et al. “The synchronization within and interaction between the default and dorsal attention networks in early infancy,” Cerebral Cortex, vol. 23, no. 3, pp. 594–603, 022012. [DOI] [PMC free article] [PubMed] [Google Scholar]
[53].Jiang R et al. “Task-induced brain connectivity promotes the detection of individual differences in brain-behavior relationships,” NeuroImage, vol. 207, p. 116370, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[54].Merikoski Jorma K KR, “Inequalities for spreads of matrix sums and products.” Applied Mathematics E-Notes [electronic only], vol. 4, pp. 150–159, 2004. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supp1-3244921

NIHMS1918682-supplement-supp1-3244921.pdf^{(4.2MB, pdf)}

[R1] [1].Edde M et al. “Functional brain connectivity changes across the human life span: From fetal development to old age,” Journal of Neuroscience Research, vol. 99, no. 1, pp. 236–262, 2021. [DOI] [PubMed] [Google Scholar]

[R2] [2].Hu W et al. “Deep collaborative learning with application to the study of multimodal brain development,” IEEE Transactions on Biomedical Engineering, vol. 66, no. 12, pp. 3346–3359, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] [3].Jolles DD et al. “A comprehensive study of whole-brain functional connectivity in children and young adults,” Cerebral Cortex, vol. 21, no. 2, pp. 385–391, 062010. [DOI] [PubMed] [Google Scholar]

[R4] [4].Qiao C et al. “Sparse deep dictionary learning identifies differences of time-varying functional connectivity in brain neuro-developmental study,” Neural Networks, vol. 135, pp. 91–104, 2021. [DOI] [PubMed] [Google Scholar]

[R5] [5].Cai B et al. “Capturing dynamic connectivity from resting state fmri using time-varying graphical lasso,” IEEE Transactions on Biomedical Engineering, vol. 66, no. 7, pp. 1852–1862, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] [6].Sun Y et al. “Brain state-dependent dynamic functional connectivity patterns in attention-deficit/hyperactivity disorder,' Journal of Psychiatric Research, vol. 138, pp. 569–575, 2021. [DOI] [PubMed] [Google Scholar]

[R7] [7].Cai B et al. “Estimation of dynamic sparse connectivity patterns from resting state fmri,” IEEE Transactions on Medical Imaging, vol. 37, no. 5, pp. 1224–1234, 2018 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] [8].Choe AS et al. “Comparing test-retest reliability of dynamic functional connectivity methods,” NeuroImage, vol. 158, pp. 155–175, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] [9].Rahim M et al. “Integrating multimodal priors in predictive models for the functional characterization of alzheimer's disease,” in Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015, Navab N, Hornegger J, Wells WM, and Frangi A, Eds. Cham: Springer International Publishing, 2015, pp. 207–214. [Google Scholar]

[R10] [10].Li Q et al. “Classification of alzheimer's disease, mild cognitive impairment, and cognitively unimpaired individuals using multi-feature kernel discriminant dictionary learning,” Frontiers in Computational Neuroscience, vol. 11, pp. 1–14, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] [11].Li Q et al. “Multi-modal discriminative dictionary learning for alzheimer's disease and mild cognitive impairment,” Computer Methods and Programs in Biomedicine, vol. 150, pp. 1–8, 2017. [DOI] [PubMed] [Google Scholar]

[R12] [12].Xiao L et al. “A manifold regularized multi-task learning model for iq prediction from two fmri paradigms,” IEEE Transactions on Biomedical Engineering, vol. 67, no. 3, pp. 796–806, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] [13].Liu F et al. “Inter-modality relationship constrained multi-modality multi-task feature selection for alzheimer's disease and mild cognitive impairment identification,” NeuroImage, vol. 84, pp. 466–475, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] [14].Hu D et al. “Disentangled-multimodal adversarial autoencoder: Application to infant age prediction with incomplete multimodal neuroimages,” IEEE Transactions on Medical Imaging, vol. 39, no. 12, pp. 41374149, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] [15].Abreu R et al. “Identification of epileptic brain states by dynamic functional connectivity analysis of simultaneous EEG-fMRI: a dictionary learning approach,” Scientific Reports, vol. 9, no. 1, p. 638, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] [16].Li H et al. “Multi-modality low-rank learning fused first-order and second-order information for computer-aided diagnosis of schizophrenia,” in Intelligence Science and Big Data Engineering. Big Data and Machine Learning, Cui Z, Pan J, Zhang S, Xiao L, and Yang J, Eds. Cham: Springer International Publishing, 2019, pp. 356–368. [Google Scholar]

[R17] [17].D'Souza N et al. “Deep sr-ddl: Deep structurally regularized dynamic dictionary learning to integrate multimodal and dynamic functional connectomics data for multidimensional clinical characterizations,” NeuroImage, vol. 241, p. 118388, 2021 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] [18].Shao W et al. “Hypergraph based multi-task feature selection for multimodal classification of alzheimer's disease,” Computerized Medical Imaging and Graphics, vol. 80, p. 101663, 2020. [DOI] [PubMed] [Google Scholar]

[R19] [19].Di D et al. “Hypergraph learning for identification of covid-19 with ct imaging,” Medical Image Analysis, vol. 68, p. 101910, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] [20].Wang C et al. “High-level attributes modeling for indoor scenes classification,” Neurocomputing, vol. 121, pp. 337–343, 2013, advances in Artificial Neural Networks and Machine Learning. [Google Scholar]

[R21] [21].Schlkopf B et al. Learning with Hypergraphs: Clustering, Classification, and Embedding MIT Press, 2007, pp. 1601–1608. [Google Scholar]

[R22] [22].Jin R et al. “Dictionary learning-based fmri data analysis for capturing common and individual neural activation maps,” IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 6, pp. 1265–1279, 2020. [Google Scholar]

[R23] [23].Zhang Z et al. “Trace ratio optimization-based semi-supervised nonlinear dimensionality reduction for marginal manifold visualization,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 5, pp. 1148–1161, 2013. [Google Scholar]

[R24] [24].Yang M et al. “Fisher discrimination dictionary learning for sparse representation,” in 2011. International Conference on Computer Vision, 2011, pp. 543–550.

[R25] [25].Zhou Y and Barner KE, “Locality constrained dictionary learning for nonlinear dimensionality reduction,” IEEE Signal Processing Letters, vol. 20 , no. 4, pp. 335–338, 2013 [Google Scholar]

[R26] [26].Beck A and Teboulle M, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM J. Img. Sci, vol. 2, no. 1, pp. 183–202, Mar. 2009. [Google Scholar]

[R27] [27].Satterthwaite TD et al. “Neuroimaging of the philadelphia neurodevelopmental cohort,” NeuroImage, vol. 86, pp. 544–553, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] [28].Power JD et al. “Functional network organization of the human brain,” Neuron, vol. 72, no. 4, pp. 665–678, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] [29].Shakil S et al. “Evaluation of sliding window correlation performance for characterizing dynamic functional connectivity and brain states,” NeuroImage, vol. 133, pp. 111–128, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] [30].Sakolu Ü et al. “A method for evaluating dynamic functional network connectivity and task-modulation: application to schizophrenia,” Magnetic Resonance Materials in Physics, Biology and Medicine, vol. 23, no. 5, pp. 351–366, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] [31].Calhoun V et al. “The chronnectome: Time-varying connectivity networks as the next frontier in fmri data discovery,” Neuron, vol. 84, no. 2, pp. 262–274, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] [32].Engan K et al. “Family of iterative 1s-based dictionary learning algorithms, ils-dla, for sparse signal representation,” Digital Signal Processing, vol. 17, no. 1, pp. 32–49, 2007. [Google Scholar]

[R33] [33].Abas M et al. “Agarwood oil quality classification using support vector classifier and grid search cross validation hyperparameter tuning,” International Journal of Emerging Trends in Engineering Research, vol. 8, no. 6, pp. 2551–2556, 2020. [Google Scholar]

[R34] [34].Saud S et al. “Performance improvement of empirical models for estimation of global solar radiation in india: A k-fold cross-validation approach,” Sustainable Energy Technologies and Assessments, vol. 40, p. 100768, 2020. [Google Scholar]

[R35] [35].Xu D et al. “Convergence of the rmsprop deep learning method with penalty for nonconvex optimization,” Neural Networks, vol. 139, pp. 17–23, 2021. [DOI] [PubMed] [Google Scholar]

[R36] [36].Bahrampour S et al. “Multimodal task-driven dictionary learning for image classification,” IEEE Transactions on Image Processing, vol. 25, no. 1, pp. 24–38, 2016. [DOI] [PubMed] [Google Scholar]

[R37] [37].Kanungo T et al. “An efficient k-means clustering algorithm: analysis and implementation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 881–892, 2002. [Google Scholar]

[R38] [38].Cohen J, “A coefficient of agreement for nominal scales,” Educational & Psychological Measurement, vol. 20, pp. 37–46, 1960. [Google Scholar]

[R39] [39].Cole M et al. “Intrinsic and task-evoked network architectures of the human brain,” Neuron, vol. 83, no. 1, pp. 238–251, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] [40].Hughes C et al. “Aging relates to a disproportionately weaker functional architecture of brain networks during rest and task states,” NeuroImage, vol. 209, p. 116521, 2020. [DOI] [PubMed] [Google Scholar]

[R41] [41].Bullmore E and Sporns O, “The economy of brain network organization,” Nature Reviews Neuroscience, vol. 13, no. 5, pp. 336–349, 2012. [DOI] [PubMed] [Google Scholar]

[R42] [42].Spreng RN et al. “The common neural basis of autobiographical memory, prospection, navigation, theory of mind, and the default mode: A quantitative meta-analysis,” J Cogn Neurosci, vol. 21, no. 3, pp. 489510, 2009. [DOI] [PubMed] [Google Scholar]

[R43] [43].Kang J et al. “Energy landscape analysis of the subcortical brain network unravels system properties beneath resting state dynamics,” NeuroImage, vol. 149, pp. 153–164, 2017. [DOI] [PubMed] [Google Scholar]

[R44] [44].St. Jacques PL et al. “Dynamic neural networks supporting memory retrieval,” NeuroImage, vol. 57, no. 2, pp. 608–616, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] [45].Bostan AC et al. “Cerebellar networks with the cerebral cortex and basal ganglia,” Trends in Cognitive Sciences, vol. 17, no. 5, pp. 241–254, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R46] [46].Smith SM et al. “Correspondence of the brain's functional architecture during activation and rest,” Proceedings of the National Academy of Sciences, vol. 106, no. 31, pp. 13040–13045, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] [47].Leaver AM et al. “Dysregulation of limbic and auditory networks in tinnitus,” Neuron, vol. 69, no. 1, pp. 33–43, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] [48].Pillay S et al. “Perceptual demand and distraction interactions mediated by task-control networks,” NeuroImage, vol. 138, pp. 141–146, 2016. [DOI] [PubMed] [Google Scholar]

[R49] [49].Mesnon V, “Salience network,” in Brain Mapping, Toga AW, Ed. Waltham: Academic Press, 2015, pp. 597–611. [Google Scholar]

[R50] [50].Londei A et al. “Sensory-motor brain network connectivity for speech comprehension,” Human Brain Mapping, vol. 31, no. 4, pp. 567–580, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R51] [51].Sheffield JM et al. “Fronto-parietal and cingulo-opercular network integrity and cognition in health and schizophrenia,” Neuropsychologia, vol. 73 , pp. 82–93, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R52] [52].Gao W et al. “The synchronization within and interaction between the default and dorsal attention networks in early infancy,” Cerebral Cortex, vol. 23, no. 3, pp. 594–603, 022012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] [53].Jiang R et al. “Task-induced brain connectivity promotes the detection of individual differences in brain-behavior relationships,” NeuroImage, vol. 207, p. 116370, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R54] [54].Merikoski Jorma K KR, “Inequalities for spreads of matrix sums and products.” Applied Mathematics E-Notes [electronic only], vol. 4, pp. 150–159, 2004. [Google Scholar]

PERMALINK

Explainable Multimodal Deep Dictionary Learning to Capture Developmental Differences from Three fMRI Paradigms

Lan Yang

Chen Qiao

Huiyu Zhou

Vince D Calhoun

Julia M Stephen

Tony W Wilson

Yuping Wang

Abstract

Objective:

Methods:

Results:

Conclusion and Significance:

I. Introduction

II. Preliminary Work

A. Hypergraph Learning

B. Fisher cost

III. Methodology

A. Explainable Multimodal Deep Dictionary Learning

Fig. 1:

B. Optimization

1). The Training of SDAE:

2). The Learning of Dictionary:

3). Sparse Representations Learning:

IV. Results and Analysis

A. Data Acquisition and Preprocessing

B. Experimental Results

Fig. 2:

TABLE I:

C. States Analysis of Multimodal Data

Fig. 3:

Fig. 5:

V. Discussion

A. Developmental Differences

1). The Common Developmental Differences of Three fMRI Paradigms:

2). The Developmental Differences of Each fMRI Paradigm:

B. Future Work

VI. Conclusion

Supplementary Material

Fig. 4:

Acknowledgments

Appendix 1

A. The Comparison of Sparsity of Activations Among L1-norm, L2-norm, and Kullback-Leibler Divergence

B. The Proof of the Convexity of fV

C. The Lipschitz Constant of the Gradient ∇V

D. The Details of Obtaining Multimodal Data

Contributor Information

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

A. The Comparison of Sparsity of Activations Among $L_{1}$ -norm, $L_{2}$ -norm, and Kullback-Leibler Divergence

B. The Proof of the Convexity of $f (V)$

C. The Lipschitz Constant of the Gradient $\nabla V$