Abstract
In this paper we construct an atlas that captures functional characteristics of a cognitive process from a population of individuals. The functional connectivity is encoded in a low-dimensional embedding space derived from a diffusion process on a graph that represents correlations of fMRI time courses. The atlas is represented by a common prior distribution for the embedded fMRI signals of all subjects. The atlas is not directly coupled to the anatomical space, and can represent functional networks that are variable in their spatial distribution. We derive an algorithm for fitting this generative model to the observed data in a population. Our results in a language fMRI study demonstrate that the method identifies coherent and functionally equivalent regions across subjects.
1 Introduction
The functional architecture of the cerebral cortex consists of regions and networks of regions that become active during specific tasks or at rest when the brain is suspected to engage in activities such as memory encoding [1]. The functional networks vary spatially across individuals due to natural variability [15], developmental processes in early childhood [9] or adulthood [4], or pathology [5]. Reorganization can appear over remarkably short periods of even few days [4]. The relationship between the structure of functional networks and their spatial distribution is not well understood.
The traditional brain imaging paradigm in most functional MRI (fMRI) studies treats functional activity as a feature of a position within the anatomical coordinate frame. The anatomical variability in a population is typically mitigated by smoothing and non-rigid registration of the anatomical data, and the corresponding normalization of functional signals into a stereotactic space. The remaining spatial variability of functional regions is ignored. An alternative approach is to localize functional regions of interest (fROIs) in individuals or groups [15] as a precursor to analysis, and subsequently study the responses in the resulting small number of fROIs.
Both approaches limit the range of questions that can be formulated on the fMRI observations. For example, the spatial normalization framework cannot express or account for spatial variability within the population since it assumes perfect spatial correspondences when detecting networks by averaging over multiple subjects. In contrast, the fROI approach is based on detection results for each subject, which can be infeasible if the activation is weak and cannot be distinguished from noise in individual subjects without averaging over the group.
We propose a different approach to characterize functional networks in a population of individuals. We do not assume a tight coupling between anatomical location and function, but view functional signals as the basis of a descriptive map that represents the global connectivity pattern during a specific cognitive process. We develop a representation of those networks based on manifold learning techniques and show how we can learn an atlas from a population of subjects performing the same task. Our main assumption is that the connectivity pattern associated with a functional process is consistent across individuals. Accordingly, we construct a generative model (the atlas) for these connectivity patterns that describes the common structures within the population.
The clinical goal of this work is to provide additional evidence for localization of functional areas. A robust localization approach is important for neurosurgical planning if individual activations are weak or reorganization has happened due to pathologies such as tumor growth. Furthermore the method provides a basis for understanding the mechanisms underlying formation and reorganization in the cerebral system.
Related work
A spectral embedding [18] represents data points in a map that reflects a large set of pair-wise affnity values in the Euclidean space. Diffusion maps establish a metric based on the concept of diffusion processes on a graph [2]. A probabilistic interpretation of diffusion maps has recently been proposed [13]. Previously demonstrated spectral methods in application to fMRI analysis mapped voxels into a space that captured joint functional characteristics of brain regions [10]. This approach represents the magnitude of co-activation by the density in the embedding. Functionally homogeneous units have been shown to form clusters in the embedding in a study of parceled resting-state fMRI data [17]. In [7] multidimensional scaling was employed to retrieve a low dimensional representation of positron emission tomography (PET) signals in a set of activated regions. In an approach closely related to the method proposed in this paper [11], an embedding of fMRI signals was used to match corresponding functional regions across different subjects. Recently a probabilistic generative model that connects the embedding coordinates with a similarity matrix has been demonstrated in [14].
2 Generative Model of Functional Connectivity
We start by reviewing the original diffusion maps formulation. We then derive a probabilistic likelihood model for the data based on this mapping and use the model to link diffusion maps of functional connectivity across subjects.
2.1 Diffusion Distances, Diffusion Maps, and fMRI Time Courses
Given an fMRI sequence that contains N voxels, each characterized by an fMRI signal over T time points, we calculate matrix that assigns a non-negative symmetric weight to each pair of voxels (i, j)
(1) |
where 〈·, ·〉 is the correlation coeffcient of the time courses Ii and Ij, and is ε the weight decay. We define a graph whose vertices correspond to voxels and whose edge weights are determined by W [2,10]. In practice, we discard all edges that have a weight below a chosen threshold. This construction yields a sparse graph which is then transformed into a Markov chain. We define the Markov transition matrix P = D−1W, where D is a diagonal matrix such that di = D(i, i) = ∑j w(i, j) is the degree of node i. By interpreting the entries P(i, j) as transition probabilities, we can define the diffusion distance
(2) |
The distance is determined by the probability of traveling between vertices i andj by taking all paths of at most t steps. The transition probabilities are based on the functional connectivity of node pairs; the diffusion distance integrates the connectivity values over possible paths that connect two points and defines a geometry that captures the entirety of the connectivity structure. This distance is characterized by the operator Pt, the tth power of the transition matrix. The value of the distance Dt(i, j) is low if there is a large number of paths of at most length t steps with high transition probabilities between the nodes i and j.
The diffusion map coordinates Γ = [γ1, γ2, · · · , γN]T yield a low dimensional embedding of the signal such that the resulting pairwise distances approximate diffusion distances, i.e., ∥γi−γj∥2 ≈ Dt(i, j) [13]. They are derived from the right eigenvectors of the transition matrix. In Appendix A we show that a diffusion map can be viewed as a solution to a least-squares problem. Specifically, we define a symmetric matrix A = D−1/2WD−1/2, and let L be the normalized graph Laplacian
(3) |
The embedding coordinates are then found as follows:
(4) |
where L is the dimensionality of the embedding. To simplify notation, we omit t for L and Γ in the derivations, assuming that all the results are derived for a fixed, known diffusion time.
2.2 A Generative Model for Diffusion Maps across Subjects
The goal of the generative model is to explain jointly the distribution of pairwise functional affnities of voxels across all subjects. We use latent variables to represent the diffusion map coordinates for S subjects indexed by s ∈ {1, . . . , S}. We can interpret Eq. (4) as maximization of a Gaussian likelihood model. We let γsi denote the embedding coordinates of voxel i in subject s and let Ls be the normalized graph Laplacian for subject s. We further assume that elements of Ls are conditionally independent given the embedding coordinates:
(5) |
where N (·; μ, σ2) is a Gaussian distribution with mean μ and variance σ2.
Note that the variance depends on the degrees di, dj, which is technically a problem since these quantities depend on the data W. We find that in practice, the method works well and leave the development of rigorous probability models for diffusion maps as an interesting future direction.
In the absence of a prior distribution on Γs, fitting this model to the data yields results similar to the conventional diffusion maps for each subject independently from the rest of the population.
The goal of this paper is to define an atlas that represents a population-wide structure of functional connectivities in the space of diffusion maps. To capture this common structure, we define a shared prior distribution on the embedding coordinates Γs for all subjects, and expect the embedded vectors to be in correspondence across subjects [11]. Here, we assume that the common distribution in the embedding space is a mixture of K Gaussian components. We let zsi ∈ {1, · · · , K} be the component assignment for voxel i in subject s and obtain the prior on the embedding coordinates of voxel i in subject s:
(6) |
where μk and Θk are the center and covariance matrices for component k. We let the component assignments be independently distributed according to the weights of different components, i.e.,
(7) |
Together, Eqs. 5 to Eqs. 7 the joint distribution of the embeddings Γ , the component assignments z, and the observed affnities . The distribution is parameterized by component centers {μk}, covariance matrices {Θk}, and weights {πk}.
By adding the group prior over diffusion maps, we constrain the resulting subject maps to be aligned across subjects and further encourage them to resemble the population-wide structures characterized by the mixture model (Fig. 1). The mixture model acts as a population atlas in the embedding space.
3 Atlas Learning and Inference
We employ the variational EM algorithm [8] to estimate the parameters of our model from the observed data. We approximate the posterior distribution of latent variables p(Γ , z|L) with a product distribution of the form
(8) |
The problem is then formulated as the minimization of the Gibbs free energy
(9) |
where indicates the expected value operator with respect to distribution q(·). We derive coordinate descent update rules that, given an initialization of all latent variables and parameters, find a local minimum of the cost function in Eq. (9). Appendix B presents the derivation of the update rules.
3.1 Initialization
The algorithm requires initial estimates of the latent variables and model parameters. Initialization affects convergence and the quality of the final solution. Here, we describe a simple initialization scheme that matches closely the structure of our model and enhances the performance of the algorithm.
In general, the relationship between the diffusion map coordinates Γ and the corresponding symmetric matrix L is defined up to an arbitrary orthonormal matrix Q since (ΓQ)(ΓQ)T = ΓQQTΓT = ΓΓT = L. In order to define an atlas of the functional connectivity across all subjects, we seek matrix Qs for each subject s such that the maps are aligned in a common coordinate frame. Consider aligning the diffusion map Γs of subject s to the diffusion map Γr of reference subject r. Similar to the construction of the diffusion map, we compute the inter-subject affnities between the fMRI signals of subjects s and r using Eq. (1) and only keep those with a correlation above the threshold. This step produces a set of M node pairs , characterized by affnities . The initialization should ensure that nodes with similar fMRI signals are close in the common embedding space. Therefore, we choose matrix Q that minimizes the weighted Euclidean distance between pairs of corresponding embedding coordinates
(10) |
We define matrices Γsm = [γsi1 , . . . , γsiM]T and Γrm = [γrj1, . . . , γrjM]T . It can be shown that , where we use the singular value decomposition diag(wm)Γrm [16].
We find initial estimates of by fitting a K component Gaussian mixture model to the initial estimates of the atlas embedding coordinates for a randomly chosen reference subject r.
4 Experiments and Results
Data
We demonstrate the method on a set of six healthy control subjects. The fMRI data was acquired using a 3T GE Signa system (TR=2s, TE=40ms, flip angle=90°, slice gap=0mm, FOV=25.6cm, volume size of 128 × 128 × 27 voxels, voxel size of 2 × 2 × 4 mm3). The language task (antonym generation) block design was 5min 10s long, starting with a 10s pre-stimulus period. Eight task and seven rest blocks, 20s each, alternated in the design. For each subject, an anatomical T1 MRI scan was acquired and registered to the functional data. Grey matter was segmented with FSL [19] on the T1 data. The grey matter labels were transferred to the co-registered fMRI volumes, and computation was restricted to grey matter.
Evaluation
We construct a joint functional diffusion map for all six subjects. For the results presented in this paper, we set the dimensionality of the diffusion map to be L = 20 and choose a diffusion time t = 2 that satisfies (λL/λ1)t < 0.2 for all subjects. To facilitate computation we only keep nodes for which the degree is above a certain threshold. In the experiments reported here we choose a threshold of 100. For the EM algorithm, we fix a value of for the first 10 iterations, then allow this parameter to update for the remaining iterations according to the rule defined in Appendix B. In our experiments, an initial value of σs specifically proportional to 102 allows the algorithm to achieve the lowest Gibbs free energy.
We hypothesize that working in the embedding space should allow us to more robustly capture the functional structure common to all subjects. In order to validate this, we compare the consistency of clustering structures found in the space of fMRI time courses (Signal), a low-dimensional (L=20) PCA embedding of these time courses (PCA-Signal), and the low-dimensional (L=20) embedding proposed in this paper. We report results for the initial alignment (Linear-Atlas) and the result of learning the joint atlas (Atlas).
We first apply clustering to signals from individual subjects separately to find subject-specific cluster assignments. We then apply clustering to signals combined from all subjects to construct the corresponding group-wise cluster assignments. Since our group atlas for the lower-dimensional space is based on a mixture model, we also choose a mixture-model for clustering in the Signal and PCA-Signal spaces. In both cases, each component in the mixture is an isotropic von Mises-Fisher distribution, defined on a hyper-sphere after centering and normalization of the fMRI signals to unit variance [12].
Likewise, we cluster the diffusion map coordinates Γs separately in each subject to obtain subject-specific assignments. We cluster the diffusion map coordinates of all subjects aligned to the first subject, {ΓsQ(s,1)}s for the Linear-Atlas and in for Atlas to obtain group-wise clustering assignments. Analyzing the consistency of clustering labels across methods evaluates how the population structure captures the individual embeddings. For the diffusion maps, Euclidean distance is a meaningful metric; we therefore use a mixture model with Gaussian components that share the same isotropic variance.
Since clusters are labeled arbitrarily in each result, we match group and subject-specific cluster labels by solving a bipartite graph matching problem. Here, we find a one-to-one label correspondence that maximizes voxel overlap between pairs of clusters, similar to the method used in [12]. After matching the labels, we use the Dice score [3] to measure the consistency between group and subject-specific assignments for each cluster.
4.1 Results
Fig. 2 reports the consistency of clusters between group-level and subject-specific assignments, measured in terms of Dice score averaged across subjects. To illustrate the temporal nature of the clusters, the colors of the bars indicate the correlation of the average fMRI signal in the cluster with the fMRI language paradigm convolved with the hemodynamic response function. Note that the paradigm was not used at any point during the generation of the maps or clusters. The cluster with the highest paradigm correlation is the most consistent cluster over a large range of cluster numbers. The highest Dice score (0.725) for Signal clustering is achieved, with similar values for larger numbers of clusters. Although the plot is not shown here, clustering in the PCA-Signal space exhibits no noticeable improvement. Initial alignment of the diffusion maps into the Linear-Atlas substantially increases the Dice score of the highest ranked clusters for all K, with a maximum value of 0.876. The variational EM algorithm performed using a range of reasonable cluster numbers and further improves the cluster agreement for the top ranked clusters (0.905).
Fig. 3 shows the networks of the subjects that correspond to the top ranked atlas cluster (K=10), together with the corresponding average fMRI signal. The paradigm is recovered very well, and for most subjects the cluster network plausibly spans visual, motor, and language areas.
Fig. 4 compares the location and average signal of the top ranked of 10 clusters for Signal and Atlas clustering in a single subject. While both recover parts of the paradigm, the clustering in the diffusion map atlas exhibits more consistency between the group and the subject levels. Additionally, the Signal clusters suffer from a relatively high dispersion across the entire cortex. This is not the case for the diffusion map atlas. In summary, these results demonstrate that the representation of fMRI time courses in the low dimensional space of diffusion maps better captures the functional connectivity structure across subjects. Not only are clustering assignments more consistent, but the anatomical characteristics of these clusters also are also more plausible. Furthermore, our results using the variational EM algorithm suggest that the probabilistic population model further improves the consistency across the population, and consolidates the distribution in the embedding space.
5 Conclusion
We propose a method to learn an atlas of the functional connectivity structure that emerges during a cognitive process from a group of individuals. The atlas is a group-wise generative model that describes the fMRI responses of all subjects in the embedding space. The embedding space is a low dimensional representation of fMRI time courses that encodes the functional connectivity patterns within each subject. Results from a fMRI language experiment indicate that the diffusion map framework captures the connectivity structure reliably, and leads to valid correspondences across subjects. Future work will focus on the application of the framework to the study of reorganization processes.
Acknowledgements
This work was funded in part by the NSF IIS/CRCNS 0904625 grant, the NSF CAREER 0642971 grant, the NIH NCRR NAC P41RR13218, NIH NIBIB NAMIC U54-EB005149, NIH U41RR019703, and NIH P01CA067165 grants, the Brain Science Foundation, the Klarman Family Foundation, and EU (FP7/2007-2013) n°257528 (KHRESMOI).
A Diffusion Map Coordinates
In the standard diffusion map analysis, the embedding coordinates Γ for a L-dimensional space are obtained via the first L eigenvectors of matrix A = D−1/2WD−1/2 [13]. Here we show that we can represent the embedding as a solution of a least-squares problem formulated directly on the similarity matrix W.
Formally, , where A = VΛVT is the eigenvector decomposition of matrix A, t s the diffusion time, and subscripts indicate that we select the first L eigenvectors. Matrix is a low-rank approximation of matrix A that is quite accurate if the remaining eigenvalues are much smaller than the sum of the first L eigenvalues. We define
(11) |
and use a generalization of the Eckart-Young theorem [6] to formulate the eigen decomposition as an optimization problem:
(12) |
where ∥·∥F denotes the Frobenius norm.
B Variational EM Update Rules
We use a natural choice of a multinomial distribution for cluster membership q(zsi = k) for s ∈ {1, . . . , S}, i ∈ {1, . . . , Ns}, and a Gaussian distribution for the embedding coordinates , parameterized by its mean and component-wise variance .
E-Step
We determine the parameter values of the approximating probability distribution q(·) that minimize the Gibbs free energy in Eq. (9) by evaluating the expectation, differentiating with respect to each parameter and setting the derivatives to zero. This yields
Rather than solve the coupled system of equations above, we iteratively update each parameter of the distribution q(·) while fixing all the other parameters.
M-Step
We now find the parameter values similar to the standard EM algorithm, but using the approximating distribution q(·) to evaluate the expectation. Specifically, we find
(13) |
(14) |
(15) |
References
- 1.Buckner RL, Andrews-Hanna JR, Schacter DL. The brain's default network: anatomy, function, and relevance to disease. Ann. N Y Acad. Sci. 2008;1124:1–38. doi: 10.1196/annals.1440.011. [DOI] [PubMed] [Google Scholar]
- 2.Coifman RR, Lafon S. Diffusion maps. App. Comp. Harm. An. 2006;21:5–30. [Google Scholar]
- 3.Dice LR. Measures of the amount of ecologic association between species. Ecology. 1945;26(3):297–302. [Google Scholar]
- 4.Elbert T, Rockstroh B. Reorganization of human cerebral cortex: the range of changes following use and injury. Neuroscientist. 2004;10(2):129–141. doi: 10.1177/1073858403262111. [DOI] [PubMed] [Google Scholar]
- 5.Elkana O, Frost R, Kramer U, Ben-Bashat D, Hendler T, Schmidt D, Schweiger A. Cerebral reorganization as a function of linguistic recovery in children: An fmri study. Cortex. 2009 December; doi: 10.1016/j.cortex.2009.12.003. [DOI] [PubMed] [Google Scholar]
- 6.Friedland S, Torokhti A. Generalized rank-constrained matrix approximations. Arxiv preprint math/0603674. 2006 [Google Scholar]
- 7.Friston K, Frith C, Fletcher P, Liddle P, Frackowiak R. Functional topography: multidimensional scaling and functional connectivity in the brain. Cerebral Cortex. 1996;6(2):156. doi: 10.1093/cercor/6.2.156. [DOI] [PubMed] [Google Scholar]
- 8.Jaakkola T. Tutorial on variational approximation methods. Advanced Mean Field Methods: Theory and Practice. 2000:129–159. [Google Scholar]
- 9.Kuhl PK. Brain mechanisms in early language acquisition. Neuron. 2010;67(5):713–727. doi: 10.1016/j.neuron.2010.08.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Langs G, Samaras D, Paragios N, Honorio J, Alia-Klein N, Tomasi D, Volkow ND, Goldstein RZ. Task-specific functional brain geometry from model maps. In: Metaxas D, Axel L, Fichtinger G, Székely G, editors. MICCAI 2008, Part I. LNCS. Vol. 5241. Springer; Heidelberg: 2008. pp. 925–933. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Langs G, Tie Y, Rigolo L, Golby A, Golland P. Lafferty J, Williams CKI, Shawe-Taylor J, Zemel R, Culotta A, editors. Functional geometry alignment and localization of brain areas. Advances in Neural Information Processing Systems. 2010;23:1225–1233. [PMC free article] [PubMed] [Google Scholar]
- 12.Lashkari D, Vul E, Kanwisher N, Golland P. Discovering structure in the space of fMRI selectivity profiles. Neuroimage. 2010;50(3):1085–1098. doi: 10.1016/j.neuroimage.2009.12.106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Nadler B, Lafon S, Coifman R, Kevrekidis I. Diffusion maps-a probabilistic interpretation for spectral embedding and clustering algorithms. Principal Manifolds for Data Visualization and Dimension Reduction. 2007:238–260. [Google Scholar]
- 14.Rosales R, Frey B. Learning generative models of affinity matrices. Proceedings of the 19th Annual Conference on Uncertainty in Artificial Intelligence (UAI 2003) 2003:485–492. [Google Scholar]
- 15.Saxe R, Brett M, Kanwisher N. Divide and conquer: a defense of functional localizers. Neuroimage. 2006;30(4):1088–1096. doi: 10.1016/j.neuroimage.2005.12.062. [DOI] [PubMed] [Google Scholar]
- 16.Scott G, Longuet-Higgins H. An algorithm for associating the features of two images. Proceedings: Biological Sciences. 1991;244(1309):21–26. doi: 10.1098/rspb.1991.0045. [DOI] [PubMed] [Google Scholar]
- 17.Thirion B, Dodel S, Poline JB. Detection of signal synchronizations in resting-state fmri datasets. Neuroimage. 2006;29(1):321–327. doi: 10.1016/j.neuroimage.2005.06.054. [DOI] [PubMed] [Google Scholar]
- 18.Von Luxburg U. A tutorial on spectral clustering. Statistics and Computing. 2007;17(4):395–416. [Google Scholar]
- 19.Woolrich MW, Jbabdi S, Patenaude B, Chappell M, Makni S, Behrens T, Beckmann C, Jenkinson M, Smith SM. Bayesian analysis of neuroimaging data in fsl. Neuroimage. 2009;45(suppl. 1):S173–S186. doi: 10.1016/j.neuroimage.2008.10.055. [DOI] [PubMed] [Google Scholar]