Skip to main content
Brain Informatics logoLink to Brain Informatics
. 2015 Feb 3;2(2):53–63. doi: 10.1007/s40708-015-0011-5

Local dimension-reduced dynamical spatio-temporal models for resting state network estimation

Gilson Vieira 1,, Edson Amaro 2, Luiz A Baccalá 3
PMCID: PMC4883146  PMID: 27747482

Abstract

To overcome the limitations of independent component analysis (ICA), today’s most popular analysis tool for investigating whole-brain spatial activation in resting state functional magnetic resonance imaging (fMRI), we present a new class of local dimension-reduced dynamical spatio-temporal model which dispenses the independence assumptions that severely limit deeper connectivity descriptions between spatial components. The new method combines novel concepts of group sparsity with contiguity-constrained clusterization to produce physiologically consistent regions of interest in illustrative fMRI data whose causal interactions may then be easily estimated, something impossible under the usual ICA assumptions.

Keywords: Resting state fMRI, Dynamical spatio-temporal models , Brain connectivity, Sparsity

Introduction

There is an ever-growing and pressing need for accurately describing how brain regions are dynamically interrelated in resting state fMRI [4]. Thanks to the nature of BOLD signals, resting state interactions cannot be split into separate space and time descriptions, especially if the focus lies on characterizing spatial changes associated to a small number of regions of interest. The chief challenge is that any dynamical spatio-temporal model (DSTM) of fMRI datasets demands many parameters to describe what is also a large number of observed variables which, nonetheless, enjoy a great deal of spatial redundancy [3, 5, 37]. Estimating the spatial origin of signal variability for only relatively short sample sizes using DSTMs is problematic especially under the rather usual unfavourable signal-to-noise ratio (SNR) conditions [8, 24, 28, 34].

To circumvent limitations of modelling high-dimensionality systems, Wikle and Cressie [33] proposed dimension-reduced DSTMs aimed at capturing nonstationary spatial dependence under optimal state representations using Kalman filtering. In their formulation of DSTM, they invoke an a priori defined orthogonal basis to expand the redistribution kernel of a discrete time/continuous space, linear integro-difference equation (IDE) in terms of a finite linear combination of spatial components [33]. This idea was further supported in [14] and extended in [26] who considered parametrized redistribution kernels of arbitrary shape that meet homogeneity conditions in both space and time. Even though the base changes of [33] improve the understanding of high-dimensional processes, they by no means ensure sparse solutions which are key to achieving statistically robust dynamical descriptions.

Model robustness has alternatively been sought by indirect means as, for example, thru LASSO regression [29] and basis pursuit [6] for model selection and denoising, or sparse component analysis for blind source separation [39] and finally by iterative thresholding algorithms for image deconvolution and reconstruction [12, 17]. The latter methods seek sparsity by maximizing a penalized loss function in a compromise between the goodness of fit and the number of basis elements that make up the signal. Recently, more attention has been given to group sparsity, where groups of variables are selected/shrunken simultaneously rather than individually (for a review see [2]). This is achieved by minimizing an objective function that includes a quadratic error term added to a regularization term that considers a priori beliefs or data-driven analysis to induce group sparsity [35, 36, 38].

The present paper extends the results in [31] about local dimension-reduced DSTMs (LDSTMs) involving state-space formulations that are suited to datasets of high dimensionality such as fMRI. LDSTMs take advantage of a sparsifying spatial wavelet transformation to represent the data thru fewer significant parameters which are then combined via sparsity and contiguity-constrained clustering to initialize the observation matrix and sources of a tailored expectation maximization (EM) algorithm. The main assumptions here are that the system is overdetermined (there exist more observed signals than sources) and that the columns of the observation matrix act as point-spreading functions (see Sect. 2). Finally, results are gauged using simulated data (Sect. 4) followed by further method illustration with directed connectivity disclosure using real fMRI resting state data.

Problem formulation

DSTM problems may be formulated as state-space models (see [9] for a comprehensive review of DSTM) where space-related measurements zt depend on the dynamical evolution of a suitably defined source vector xt through a linear gaussian model

xt=l=1LHlxt-l+wt 1
zt=Axt+vt, 2

where zt is an M dimensional column vector of observed signals at time t,xt is an K dimensional column vector of unknown source vectors, A is an unknown M×K observation matrix, Hl for 1lL are unknown K×K matrices that describe source vector dynamics, wt is an innovation process and vt is an additive noise. Both wt and vt are assumed zero mean gaussian, respectively, with covariance Q and R. The Hl matrices, the observation matrix A together with Q and R and xt must be inferred from zt. For added generality, Eq. (1) is presented in a slightly extended form compared to the corresponding model in [31].

Under the latter premises, the log-likelihood of model (1, 2) is given by

logp(x,A,R,H1,,HL,Q|z)=-T2log|R|-12t=1T(zt-Axt)TR-1(zt-Axt)-T-12log|Q|-12t=L+1Txt-l=1LHlxt-lTQ-1xt-l=1LHlxt-l, 3

where z=vec(z1zT),x=vec(x1xT) and vec stands for the column stacking operator [27].

The EM algorithm has long been the favourite tool to solve (1,2) for xt because (3) is sure to converge to at least a local maximum [13, 27]. The traditional EM algorithm starts with randomly generated solutions for all parameters and then proceeds by re-iterating its two main steps until the maximum of (3) is attained. It begins with the E-step where the unknown xt are replaced by their expected values given the data and current model parameter estimates. Under gaussian assumptions, the expected system xt are obtained via the Rauch–Tung–Striebel (RTS) smoother [25]. In the second algorithm step, the M-step, one estimates model parameters by maximizing the conditional expected likelihood from the previous E-step. In practice, EM algorithm performance degrades rapidly for high-dimensional systems under (1,2). Its solution may even become indeterminate and improper initialization, in fact, often deteriorates estimate quality.

To achieve robust EM solutions, we take into account two common neuroscientist concerns as to what constitute meaningful brain activity components: (a) xt be an economic (i.e. compact/low dimensional) dynamical representation of the brain resting state fMRI dataset as a whole and (b) solutions must be spatially localized, i.e. their associated activation areas mathematically reflect point-spreading functions. We show that the latter assumptions not only allows estimating (1,2) parameters but also xt using the simpler Local Sparse Component Analysis discussed in [32] on zt. The nutshell description of the present algorithm is represented in Fig. 1. The aim is to find initial estimators for the observation matrix and system states which are used to initialize a EM algorithm for maximization of (3).

Fig. 1.

Fig. 1

The main algorithm consists of (i) the application of a sparsifying spatial wavelet transformation, resulting into a description in terms of wavelet coefficient time series, (ii) contiguity-constrained clustering of the time series of wavelet coefficients by grouping only nearby coefficients and (iii) estimation of the observation matrix and system states by linear dimensionality reduction of the identified clusters

Algorithm details

Sparsifying spatial wavelet transformation

Given {ϕm}1mM an wavelet basis in RM, the first step is to calculate the wavelet representation of the matrix of observations Z(zm,t)m,t for 1mM and 1tT

Z^(z^m,t)m,t=(zt,ϕm)m,t=ΦZ, 4

where Φ is the M×M orthonormal matrix, whose rows are the ϕm’s. With obvious notation, Z=S+V, where S=AX, and Z^=S^+V^. The transform Φ should be chosen such that a tailored clustering of the rows of S^ provides the elements that approximate the rows of X. But before this step, S^ must be estimated using the sparsity assumption which implies finding a sparse representation of Z^ that captures its intrinsic degrees of freedom.

By considering that st=Axt admits a sparse representation lying in B1,1s, a particular kind of Besov space [23], approximating zt by stB1,1s, can be expressed by adding a penalization term to zt-st22 requiring that sts,1 be small, where sts,1 is the B1,1s norm of st. In other words, we want to minimize the following function:

f(st)=zt-st22+sts,1=zt-st22+mλm|s^m,t|, 5

where s^m,t=st,ϕm and λm>0 for 1mM are regularization parameters [12].

For each t, the above function is coercive and strictly convex which means that it has a unique global minimum. If λm=λ, the minimum value of (5) is obtained via the soft-thresholding operator [15]

s^m,t=sign(z^m,t)max(|z^m,t|-λ,0). 6

Since s^m,t can be zero for some values of t but not for others, the estimator (6) does not ensure sparsity of st over time even for large λ values. To overcome this problem, we propose tying s^m,t for 1tT together and using a recently introduced group-separable regularizer for the functional (5) but in the wavelet domain

mins^m12z^m-s^m22+λms^m2, 7

where z^m and s^m are the m-th rows of Z^ and S^, respectively. Given λm, solving (7) is achieved by the vector soft-thresholding operator [7, 35]

s^m=maxz^m2-λm,0z^m2z^m. 8

In practice, we still need to estimate λm in (8) for signal denoising. Since Φ is orthogonal, if R=σ2IM×M, then v^mN0,σ2IT×T, where v^m is the m-th row of V^. For very large datasets, this assumption is quite strong but commonly employed in literature. As zt is sparse under Φ, most of {s^m,t}m must be zero. Provided that fifty percent of {s^m,t}m are zero, the following unbiased estimator for σ2 can be defined

σ^2=medianmVAR^{z^m,t}, 9

where VAR^ denotes temporal sample covariance.

If VAR{s^m,t}=0, we have that z^m,t are i.i.d normal variables, so

(N-1)VAR^{z^m,t}σ2χN-12 10

implies that an interval with (1-α) confidence for σ2 is given by

[(N-1)σ^2χ1-α/2,N-12,(N-1)σ^2χα/2,N-12], 11

where χξ,ν2 is the ξ-th percentile of the chi-square distribution with ν degrees of freedom. Since z^m2=(N-1)VAR^{z^m,t}, (11) leads to λm given by

λm=(N-1)2σ^2χα/2,N-12, 12

with α=0.05/M.

Contiguity-constrained clustering

The next step consists of determining which time series of wavelet coefficients s^m are associated to each spatial component akxk, where ak is the k-th column of A and xk is the k-th row of X. For this, we use the spatial localization assumption. As the columns of the observation matrix are point-spreading functions, they should be perfectly described by wavelet coefficients forming localized spatial patterns. In this case, each spatial component can be determined using a clustering algorithm enforcing spatial contiguity. One way of achieving this is to apply complete linkage hierarchical clustering with the help of a dissimilarity measure that combines the time series temporal correlation and the physical distance between the wavelet coefficients. In this case, complete linkage hierarchical clustering is attractive because it yields relatively homogeneous clusters, a key property for subsequent accurate reduction of cluster dimensionality.

Clusterization begins with each s^m defining a singleton cluster. At each step, it groups a pair (A,B) of clusters under the condition of minimizing the following distance function:

dist(A,B)=max{ψ(s^i,s^j):iA,jB}, 13

where

ψ(s^i,s^j)=1,|ϕ¯i-ϕ¯j|>max(2li,2lj)1-|cor(s^i,s^j)|,otherwise, 14

where cor(s^i,s^j) denotes the correlation between s^i and s^j,ϕ¯i=Rds|ϕi|2ds/Rd|ϕi|2ds defines de center of mass of ϕi and li is the scale index of ϕi in the wavelet decomposition. Accordingly, the above dissimilarity measure combines the absolute value of the correlation coefficient and the physical distance between the wavelet coefficients. Clusterization stops when the minimal distance between the clusters is larger than r (i.e. min{dist(A,B):A,B}>r), for some appropriately chosen r thus leading to a list of cluster memberships that characterize the system’s spatial components.

Even though the dissimilarity measure (14) already establishes much of the structure that forms the spatial components of (17), one must decide when to stop clustering by an appropriate value of r. Note that the dist(A,B) depends solely on the correlation between the wavelet coefficients in A and B. The Fisher z-transform of correlation coefficients, 0.5loge(1+r1-r), follows a well-known statistic whose upper limit with an (1-α/2) % confidence under the null hypothesis of independence is approximately

u=z(1-α/2)1/(N-3), 15

where z(1-α/2) is the standard normal. Hence, we set the stopping value as

r=1-|(exp(2u)+1)/(exp(2u)-1)| 16

for α=0.05, which interestingly allows estimating the number of spatial components with reference neither to the actual noise level nor to the number of variables, but solely depending on sample size.

Within cluster dimensionality reduction

The next step consists of estimating the observation matrix A and system states of (1, 2) by linear dimensional reduction of each spatial cluster identified in the previous step. After clustering the rows of S^, the k-th spatial component akxk can be approximated by

Yk=iIkϕi-1s^i, 17

where Yk is an M×T data matrix, ϕi-1 is the i-th column of the inverse of Φ(ΦT, for wavelet transforms) and Ik contains the indexes of the k-th cluster. We assume that the rows of Yk have zero mean, otherwise their mean value can be removed after (17).

According to the approximation model,

Yk=akxk+Ek, 18

where Ek is an M×T approximation error matrix, and one must find ak and xk minimizing the approximation error

minak,xkYk-akxkF, 19

where ·F denotes the Frobenius norm.

In fact, each spatial component akxk is a rank-one M×T matrix given by the first singular value of Yk, i.e.

Ykσ1u1v1T, 20

where σ1 is the largest singular value of Yk, and where u1 and v1 are, respectively, the left-singular vector and the right-singular vectors associated to σ1. With no loss of generality, we consider that the norm of ak equals one leading to

ak=u1 21
xk=σ1v1T. 22

LDSTM parameter estimation

The remainder of the algorithm consists of applying the traditional EM algorithm for xt estimation [27] using the estimators for xk and ak from previous section to set the initial values for xt and A. Additionally, during the iterative process, A matrix estimation is modified to accommodate linear equality constraints that ensure well-localized ak’s. This is done by solving the following least squares problem:

minakakxk-Z22subject toCak=0, 23

where C=(ci,j)i,j is an M×M matrix with ci,i=1 if VAR(si,tk)>0 and ci,j=0 otherwise.

Numerical illustration

Using simulated data to examine algorithm performance under different conditions, we created a vector time series corresponding to points on a discretized one-dimensional space consisting of M=256 space points whose activity evolves in over a period of T=500 points each. The observation matrix that we used (Fig. 2a) consists of the columns of

A=[f80f180f100],

where fμ=[f1,μ,,fM,μ]T with fi,μ=f(i-μ) and f following a discretized Gaussian point-spread function. The observations were corrupted by white Gaussian noise with covariance matrix

R=σ2I128,128,

with σ2 accounting for the SNR level defined as SNR=10log10(VAR(s)/σ2) where s=vec(Ax1AxN). The dynamics of the spatial components evolved according to a first-order autoregressive model (L=1) with

H1=0.5-0.5000.50000,

and

Q=10.500.520002.

Figure 2b shows the sample variance for a simulated DSTM using the above parameters under SNR=-19db.

Fig. 2.

Fig. 2

a Measurement matrix A and b sample variance of the example model with N=500 and SNR=-19 db

We used Daubechies (D2) functions to transform the data and gauged performance by executing 100 Monte Carlo simulations leading to the mean and deviation results as shown in Fig. 3. Algorithm effectiveness was evaluated in terms of how well sources were recovered, as measured by their correlation to the estimated xt, and by how well Hl and Q could be estimated as evaluated by computing the connectivity between states using Partial Directed Coherence (PDC) [1].

Fig. 3.

Fig. 3

a Efficiency comparison between LDSTM (solid lines) and EM (dashed lines) in recovering source temporal information. Lines represent the mean correlation between the simulated hidden state xk,t and the estimated hidden state x^k,t across 100 simulations. Vertical error bars denote the 95 % confidence interval of the mean value. b Dotted lines represent the theoretical PDC of x2 towards x1 together with estimated PDC values of x2 towards x1 using LDSTM (solid) and EM (dash)

Simulation results

The mean absolute values of the correlation coefficient between the simulated and estimated sources versus SNR in Fig. 3a show that LDSTM outperforms traditional EM, with very good results for all the three sources even under very unfavourable SNR. Figure 3b shows PDC from x2 towards x1 for different SNR levels compared to the corresponding EM estimates. Correct PDC patterns were obtained whose magnitude decreases as SNR decreases but whose overall shape remains.

Real FMRI data

For further illustration purposes, we used fMRI images from seven healthy volunteers under a resting state protocol (approved by the local ethical committee and under individual informed written consent).

Image data acquisition

Whole-brain fMRI images (TR=600ms,TE=33ms,32 slices, FOV=247×247mm, matrix size 128×128, in-plane resolution 1.975×1.975mm, slice thickness 3.5mm with 1.8mm of gap) were acquired on a 3T Siemens system using a Multiplexed Echo Planar Imaging sequence (multi-band accelerator factor of 4) [16]. To aid in the localization of functional data, high-resolution T1-weighted images were also acquired with an MPRAGE sequence (TR=2500ms,TE=3.45ms, inversion time = 1000 ms, 256×256mm FOV, 256×256 in-plane matrix, 1×1×1mm voxel size, 7 flip angle).

LDSTM preprocessing

Motion and slice time correction and temporal high-pass filtering (allowing fluctuations above 0.005Hz) were carried out using FEAT v5.98. The fMRI data were aligned to the grey matter mask via FreeSurfer’s automatic registration tools (v. 5.0.0) resulting in extracted BOLD signals at regions with preponderantly neuronal cell bodies. To further group analysis by temporal concatenation of the participants’ fMRIs, individual grey matter images were registered to the 3-mm-thick Montreal Neurological Institute (MNI) template using a 12-parameter affine transform. To generate the spatial wavelet transformation, we used 3D Daubechies (D2) functions up to level 3. The model order for the dynamical component in (1) was defined by the Akaike information criterion.

ICA processing

To compare the LDSTM components with ICA, PICA was performed by multi-session temporal concatenation group ICA (using MELODIC in FSL). Preprocessing included slice time correction, motion correction, skull stripping, spatial smoothing (FWHM equals to 5 mm) and temporal high-pass filtering (allowing fluctuations above 0.005Hz). The functional images were aligned into the standard space by applying 12 degrees-of-freedom linear affine transformation, and its time series were normalized to have variance of unity. The number of components was fixed at 30 to match the distinct pattern of resting state networks (RSN) usually found by other authors [4, 10].

Image results

LDSTM results

Figure 4 illustrates the advantage of wavelet transforming resting state fMRI datasets: the entropy in the image domain is much larger than that in the wavelet domain. This means that only a few wavelet coefficients are enough to account for much of the signal energy. In the example, 10 % of the wavelet coefficients explain 80 % of the image energy which is two times more than the 10 % of the most powerful image domain coefficients which represent just 40 %.

Fig. 4.

Fig. 4

Fraction of cumulative energy in the image (green) and wavelet (red) domain for the resting state fMRI dataset. The blue vertical line crosses the fraction of cumulative energy represented by 10 % of the most energetic coefficients in the image (40 %) and wavelet (80 %) domains

LDSTM analysis identified thirty nine well-localized spatial components comprising cortical (18), subcortical (2) and cerebellar (19) regions. Cortical and subcortical spatial components (ak’s) are shown in Fig. 5 which includes the following anatomical areas: occipital cortex (SC1 and SC2), lateral and superior occipital gyrus (SC5,SC6 and SC20), superior temporal gyrus (SC9 and SC10), precentral gyrus (SC13 and SC14), superior parietal gyrus (SC17 and SC18), precuneus (SC3 and SC19) and posterior cingulate (SC4), inferior frontal gyrus and anterior cingulate (SC7,SC8,SC11 and SC11)) and thalamus (SC15 and SC16). Cerebellar regions also form well-localized bilateral activity patterns as shown in Fig. 6.

Fig. 5.

Fig. 5

Cortical and subcortical components identified by LDSTM

Fig. 6.

Fig. 6

Cerebellum components identified by LDSTM

The absence of artificial stochastic model constraints permitted exposing the dynamic connectivity between the identified components. Figure 7a summarizes the connectivity network estimated using PDC applied to the reconstructed system components. In addition, PDC also highlights that resting state connectivity is present mainly at low frequencies (Fig. 7b), corroborating several studies of resting state brain connectivity [4].

Fig. 7.

Fig. 7

FMRI resting state analysis using LDSTM. Numbers represent different components. Components numbered twice represent two components located at the same region. a Connectivity map showing components whose system states are connected via the PDC. b PDC plots for each arrow drawing in a Dashed lines denote the 95 % confidence interval of the mean value (solid lines)

ICA results

Among the 30 component maps obtained by performing a PICA across all participants, 14 components were considered artifactual components due to scanner and physiological noise. Their signal variances are related to cerebrospinal fluid and white matter, head motion and large vessels. Figure 8 depicts fourteen functional components related to previously report resting state studies. They comprise the default mode network (IC2, IC9, IC10) and brain regions involved in visual (IC1, IC4), auditory/motor (IC5), sensory/motor (IC8), attentional (IC7, IC6, IC12, IC13) and executive functions (IC7, IC11, IC14). In addition, we found 2 components rarely reported in resting state studies. One is a cerebellum component (IC16) and the other is a brainstem component (IC15).

Fig. 8.

Fig. 8

ICA spatial components. The components are sorted according to their relative percentage of variance from top left to bottom right

Discussion

Local dimension-reduced modelling (LDSTM) as presented here addresses an approach to source estimation and localization in resting state fMRI data analysis that dispenses with artificial stochastic model assumptions, such as those used in classical blind source separation (principal component analysis (PCA), independent component analysis (ICA) and non-negative matrix factorization (NMF) [3, 18, 19, 21]). In addition to being sparse, the columns of the observation matrix act as point-spreading functions that allow system sources and their observation matrix to be identified via LSCA [32] of the whole fMRI dataset.

The cortical components identified by LDSTM (Fig. 5) reflect most of the data variability and coincide with traditional resting state regions observed across different individuals, data acquisition and analysis techniques. They comprise the default mode network (SC8) and brain regions involved in visual (SC1,SC2,SC5,SC6), motor (SC13,SC14,SC7) and attentional functions (SC9,SC10,SC17,SC18), indicating that most of the ICA components (Fig. 8) can in fact be decomposed into several local sparse components. However, the present results draw attention to the fact that they were obtained without any additional assumption, such as source independence and/or stationarity. All that was assumed was ak spatial localization, which goes along the line of [11]’s observation that ICA effectiveness for brain FMRI is linked to their ability to handle sparse sources rather than independent ones. This could be explained by pointing out that ICA preprocessing steps involve projecting the data into a reduced-dimensional subspace via the singular value decomposition which in turn confines the sources to regions of high signal variance.

PDC analysis shows a network where the information flows from regions in the superior parietal cortex (SPC) to regions in the cerebellum (CER) and anterior cingulate. As expected, the right SPC sends information to the left CER, and left SPC sends information to the right CER. Although the relationship between these structures is known, this stresses two main systems engaging in the mentioned network. The connectivity between SPC and CER is in line with recent studies showing evidence of a cerebellar-parietal network involved in phonological storage [22]. In addition, visual–parietal–cerebellar interactions are expected by following studies of effective connectivity using FMRI [20]. We also observe a network running from the left to right parietal cortex passing through the posterior cingulate. Altogether, we believe that our results provide insight into the mechanisms of how the regions of the fronto-parietal network interact. It also highlights understudied aspects of the cerebellum in this network during resting state.

In our model, LDSTM identified approximately 50 % of components in the cerebellum. This result is surprising as the rate of cerebellar components identified in resting state using ICA is below 20 % in general [4]. Some of these regions seem to be related to noise sources for being located near cerebellar arteries and veins. The components SM1, SM2, SM12, SM17 and SM18 run in the superior surface of the cerebellum near to the superior cerebellar veins, while the components SM8 and SM9 extend into the end of the straight sinus near to internal cerebral veins. On the other hand, the idea that the cerebellum should present as many components as the cortex is encouraging. Many recent fMRI studies have shown that different cerebellar regions are critical for processing higher-order functions in different cognitive domains, in the same way as it occurs in the cortex [30]. In these studies, it is worth mentioning that cerebellar clusters are always smaller than those of corresponding functionality in the cortex. We believe that some differences between ICA and LDSTM may be explained in part by the features along the domain in which they represent the sources.

Since spatial wavelet analysis efficiently encodes the data neighbourhood information via a orthogonal transformation, the present method properly addresses a number of issues involving whole-brain connectivity estimation. The first one is associated to the lack of knowledge about the spatial localization of the sources. The method provides a data-driven approach to locate the main sources of data variability, thus avoiding the effects and uncertainties due to a priori region of interest delineation. The second aspect is that the new method naturally employs multi-scale transformations to create a compact model of the images, a feature of growing importance as higher-resolution images are sure to become available in the near future and whose computational processing load may be thereby substantially mitigated. Finally and most importantly, unlike ICA, the method permits deeper connectivity analysis between the identified spatial components as no independence assumption is made ’a priori’.

Various method extensions are possible, especially when it comes to estimate appropriate regularization parameter choice as a function of the amount of noise present in the data. In the present implementation, spatial noise is assumed homogeneous and normally distributed which implies a chi-squared distribution for wavelet coefficient variance. Examination of wavelet coefficients variance for real fMRI data, however, points to the need to consider heavy-tailed distributions, so that a more general approach is currently being developed to estimate wavelet domain noise variance from a finite mixture of exponential distributions that could then be used to quantify the level of data sparsity.

Conclusions

Here, an EM-based algorithm was presented for LDSTM identification. By projecting high-dimensional datasets into smoothness spaces, one can describe the system’s spatial components via a reduced number of parameters. Further dimension reduction and denoising is obtained by soft-vector thresholding under contiguity-constrained hierarchical clustering. Finally, simulated results corroborate that the new algorithm can outperform the traditional EM approach even under mild conditions. Even with very large datasets as in the fMRI example, LDSTM shows promise in its ability to parcelate the human brain into well-localized physiologically plausible regions of spatio-temporal brain activation patterns.

Acknowledgments

CNPq Grants 307163/2013-0 to L.A.B. We also thank NAPNA—Núcleo de Neurociência Aplicada from the University of São Paulo—and FAPESP Grant 2005/56464-9 (CInAPCe) during which time part of this work took place.

Conflict of interest

The authors declare that they have no conflict of interest.

References

  • 1.Baccala LA, de Brito CSN, Takahashi DY, Sameshima K. Unified asymptotic theory for all partial directed coherence forms. Philos Trans R Soc A. 2013;371(1997):20120158. doi: 10.1098/rsta.2012.0158. [DOI] [PubMed] [Google Scholar]
  • 2.Bach F, Jenatton R, Mairal J, Obozinski G (2011) Structured sparsity through convex optimization. arXiv e-print arXiv:1109.2397
  • 3.Beckmann CF, Smith SM. Probabilistic independent component analysis for functional magnetic resonance imaging. IEEE Trans Med Imaging. 2004;23(2):137–152. doi: 10.1109/TMI.2003.822821. [DOI] [PubMed] [Google Scholar]
  • 4.Biswal BB, Mennes M, Zuo X-N, Gohel S, Kelly C, Smith SM, Beckmann CF, Adelstein JS, Buckner RL, Colcombe S, Dogonowski A-M, Ernst M, Fair D, Hampson M, Hoptman MJ, Hyde JS, Kiviniemi VJ, Ktter R, Li S-J, Lin C-P, Lowe MJ, Mackay C, Madden DJ, Madsen KH, Margulies DS, Mayberg HS, McMahon K, Monk CS, Mostofsky SH, Nagel BJ, Pekar JJ, Peltier SJ, Petersen SE, Riedl V, Rombouts SARB, Rypma B, Schlaggar BL, Schmidt S, Seidler RD, Siegle GJ, Sorg C, Teng G-J, Veijola J, Villringer A, Walter M, Wang L, Weng X-C, Whitfield-Gabrieli S, Williamson P, Windischberger C, Zang Y-F, Zhang H-Y, Castellanos FX, Milham MP. Toward discovery science of human brain function. Proc Natl Acad Sci USA. 2010;107(10):4734–4739. doi: 10.1073/pnas.0911855107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Blumensath T, Jbabdi S, Glasser MF, Van Essen DC, Ugurbil K, Behrens TEJ, Smith SM. Spatially constrained hierarchical parcellation of the brain with resting-state fMRI. NeuroImage. 2013;76:313–324. doi: 10.1016/j.neuroimage.2013.03.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Chen SS, Donoho DL, Saunders MA. Atomic decomposition by basis pursuit. SIAM J Sci Comput. 1998;20(1):33–61. doi: 10.1137/S1064827596304010. [DOI] [Google Scholar]
  • 7.Combettes PL, Wajs VR. Signal recovery by proximal forward-backward splitting. Multiscale Model Simul. 2005;4(4):1168–1200. doi: 10.1137/050626090. [DOI] [Google Scholar]
  • 8.Cortes J. Distributed kriged kalman filter for spatial estimation. IEEE Trans Autom Control. 2009;54(12):2816–2827. doi: 10.1109/TAC.2009.2034192. [DOI] [Google Scholar]
  • 9.Cressie N, Wikle CK. Statistics for spatio-temporal data. Hoboken: Wiley; 2011. [Google Scholar]
  • 10.Damoiseaux JS, Rombouts SARB, Barkhof F, Scheltens P, Stam CJ, Smith SM, Beckmann CF. Consistent resting-state networks across healthy subjects. Proc Natl Acad Sci. 2006;103(37):13848–13853. doi: 10.1073/pnas.0601417103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Daubechies I, Roussos E, Takerkart S, Benharrosh M, Golden C, D’Ardenne K, Richter W, Cohen JD, Haxby J. Independent component analysis for brain fMRI does not select for independence. Proc Natl Acad Sci USA. 2009;106(26):10415–10422. doi: 10.1073/pnas.0903525106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Daubechies I, Defrise M, De Mol C (2003) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. arXiv e-print math/0307152
  • 13.Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B. 1977;39(1):1–38. [Google Scholar]
  • 14.Dewar M, Scerri K, Kadirkamanathan V. Data-driven spatio-temporal modeling using the integro-difference equation. IEEE Trans Signal Process. 2009;57(1):83–91. doi: 10.1109/TSP.2008.2005091. [DOI] [Google Scholar]
  • 15.Donoho DL, Johnstone IM, Kerkyacharian G, Picard D. Wavelet shrinkage: asymptopia? J R Stat Soc Ser B (Methodol) 1995;57(2):301–369. [Google Scholar]
  • 16.Feinberg DA, Moeller S, Smith SM, Auerbach E, Ramanna S, Glasser MF, Miller KL, Ugurbil K, Yacoub E. Multiplexed echo planar imaging for sub-second whole brain FMRI and fast diffusion imaging. PLoS One. 2010;5(12):e15710. doi: 10.1371/journal.pone.0015710. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Figueiredo MAT, Nowak RD. An EM algorithm for wavelet-based image restoration. IEEE Trans Image Process. 2003;12(8):906–916. doi: 10.1109/TIP.2003.814255. [DOI] [PubMed] [Google Scholar]
  • 18.Friston KJ, Frith CD, Liddle PF, Frackowiak RSJ. Functional connectivity: the principal-component analysis of large (PET) data sets. J Cereb Blood Flow Metab. 1993;13(1):5–14. doi: 10.1038/jcbfm.1993.4. [DOI] [PubMed] [Google Scholar]
  • 19.Georgiev P, Theis F, Cichocki A, Bakardjian H. Sparse component analysis: a new tool for data mining. In: Pardalos PM, Boginski VL, Vazacopoulos A, editors. Data mining in biomedicine, number 7 in Springer optimization and its applications. New York: Springer; 2007. pp. 91–116. [Google Scholar]
  • 20.Kellermann T, Regenbogen C, De Vos M, Mnang C, Finkelmeyer A, Habel U. Effective connectivity of the human cerebellum during visual attention. J Neurosci. 2012;32(33):11453–11460. doi: 10.1523/JNEUROSCI.0678-12.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Lohmann G, Volz KG, Ullsperger M. Using non-negative matrix factorization for single-trial analysis of fMRI data. Neuroimage. 2007;37(4):1148–1160. doi: 10.1016/j.neuroimage.2007.05.031. [DOI] [PubMed] [Google Scholar]
  • 22.Macher K, Bhringer A, Villringer A, Pleger B. Cerebellar-parietal connections underpin phonological storage. J Neurosci. 2014;34(14):5029–5037. doi: 10.1523/JNEUROSCI.0106-14.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Mallat SG. A wavelet tour of signal processing the sparse way. Amsterdam; Boston: Elsevier/Academic Press; 2009. [Google Scholar]
  • 24.Mardia KV, Goodall C, Redfern EJ, Alonso FJ. The kriged kalman filter. Test. 1998;7(2):217–282. doi: 10.1007/BF02565111. [DOI] [Google Scholar]
  • 25.Rauch HE, Striebel CT, Tung F. Maximum likelihood estimates of linear dynamic systems. J Am Inst Aeronaut Astronaut. 1965;3(8):1445–1450. doi: 10.2514/3.3166. [DOI] [Google Scholar]
  • 26.Scerri K, Dewar M, Kadirkamanathan V. Estimation and model selection for an IDE-based spatio-temporal model. IEEE Trans Signal Process. 2009;57(2):482–492. doi: 10.1109/TSP.2008.2008550. [DOI] [Google Scholar]
  • 27.Shumway RH, Stoffer DS. An approach to time series smoothing and forecasting using the em algorithm. J Time Ser Anal. 1982;3(4):253–264. doi: 10.1111/j.1467-9892.1982.tb00349.x. [DOI] [Google Scholar]
  • 28.Stoodley CJ, Schmahmann JD. Functional topography in the human cerebellum: a meta-analysis of neuroimaging studies. Neuroimage. 2009;44(2):489–501. doi: 10.1016/j.neuroimage.2008.08.039. [DOI] [PubMed] [Google Scholar]
  • 29.Theophilides CN, Ahearn SC, Grady S, Merlino M. Identifying west nile virus risk areas: the dynamic continuous-area space-time system. Am J Epidemiol. 2003;157(9):843–854. doi: 10.1093/aje/kwg046. [DOI] [PubMed] [Google Scholar]
  • 30.Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B. 1994;58:267–288. [Google Scholar]
  • 31.Vieira G, Amaro E, Baccala LA. Local dimension-reduced dynamical spatio-temporal models for resting state network estimation. In: Hutchison D, Kanade T, Kittler J, Kleinberg JM, Kobsa A, Mattern F, Mitchell JC, Naor M, Nierstrasz O, Pandu Rangan C, Steffen B, Terzopoulos D, Tygar D, Weikum G, lezak D, Tan A-H, Peters JF, Schwabe L, editors. Brain informatics and health. Cham: Springer International Publishing; 2014. pp. 436–446. [Google Scholar]
  • 32.Vieira G, Amaro E, Baccala LA (2014) Local sparse component analysis for blind source separation: an application to resting state fmri. In: Proceedings of IEEE EMBS conference, IEEE [DOI] [PubMed]
  • 33.Wikle CK, Cressie N. A dimension-reduced approach to space-time kalman filtering. Biometrika. 1999;86(4):815–829. doi: 10.1093/biomet/86.4.815. [DOI] [Google Scholar]
  • 34.Woolrich MW, Jenkinson M, Michael Brady J, Smith SM. Fully bayesian spatio-temporal modeling of FMRI data. IEEE Trans Med Imaging. 2004;23(2):213–231. doi: 10.1109/TMI.2003.823065. [DOI] [PubMed] [Google Scholar]
  • 35.Wright SJ, Nowak RD, Figueiredo MAT. Sparse reconstruction by separable approximation. IEEE Trans Signal Process. 2009;57(7):2479–2493. doi: 10.1109/TSP.2009.2016892. [DOI] [Google Scholar]
  • 36.Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J R Stat Soc. 2006;68(1):49–67. doi: 10.1111/j.1467-9868.2005.00532.x. [DOI] [Google Scholar]
  • 37.Zalesky A, Fornito A, Harding IH, Cocchi L, Ycel M, Pantelis C, Bullmore ET. Whole-brain anatomical networks: Does the choice of nodes matter? NeuroImage. 2010;50(3):970–983. doi: 10.1016/j.neuroimage.2009.12.027. [DOI] [PubMed] [Google Scholar]
  • 38.Zhao P, Rocha G (2009) The composite absolute penalties family for grouped and hierarchical variable selection. Ann Stat 37(6A):3468–3497 arXiv e-print arXiv:0909.0411
  • 39.Zibulevsky M, Pearlmutter BA. Blind source separation by sparse decomposition in a signal dictionary. Neural Comput. 2001;13(4):863–882. doi: 10.1162/089976601300014385. [DOI] [PubMed] [Google Scholar]

Articles from Brain Informatics are provided here courtesy of Springer

RESOURCES