UNSUPERVISED AUTOMATIC WHITE MATTER FIBER CLUSTERING USING A GAUSSIAN MIXTURE MODEL

Meizhu Liu; Baba C Vemuri; Rachid Deriche

doi:10.1109/ISBI.2012.6235600

. Author manuscript; available in PMC: 2012 Dec 31.

Published in final edited form as: Proc IEEE Int Symp Biomed Imaging. 2012 May;2012(9):522–525. doi: 10.1109/ISBI.2012.6235600

UNSUPERVISED AUTOMATIC WHITE MATTER FIBER CLUSTERING USING A GAUSSIAN MIXTURE MODEL

Meizhu Liu ¹, Baba C Vemuri ¹, Rachid Deriche ²

PMCID: PMC3533447 NIHMSID: NIHMS356583 PMID: 23285315

Abstract

Fiber tracking from diffusion tensor images is an essential step in numerous clinical applications. There is a growing demand for an accurate and efficient framework to perform quantitative analysis of white matter fiber bundles. In this paper, we propose a robust framework for fiber clustering. This framework is composed of two parts: accessible fiber representation, and a statistically robust divergence measure for comparing fibers. Each fiber is represented using a Gaussian mixture model (GMM), which is the linear combination of Gaussian distributions. The dissimilarity between two fibers is measured using the total square loss function between their corresponding GMMs (which is statistically robust). Finally, we perform the hierarchical total Bregman soft clustering algorithm on the GMMs, yielding clustered fiber bundles. Further, our method is able to determine the number of clusters automatically. We present experimental results depicting favorable performance of our method on both synthetic and real data examples.

Keywords: Fiber clustering, Gaussian mixture model, Gaussian process, total Bregman divergence, total square loss

1. INTRODUCTION

Diffusion magnetic resonance imaging (dMRI) is a very popular technique that has been widely studied and reported in literature. It uses diffusion sensitizing gradients to non-invasively capture the anisotropic properties of tissue. Diffusion tensor imaging (DTI) approximates the diffusivity function by a symmetric positive definite tensor of order two [1]. DTI allows one to capture the diffusion direction of water molecules in the brain, associated with the direction of fiber tracts in the white matter. When this movement is hindered by membranes and macromolecules, water diffusion becomes anisotropic. Therefore, in highly structured tissues such as nerve fibers, this anisotropy can be used to characterize local tissue structure. Fiber tractography and fiber clustering are promising for many applications such as analyzing brain tumors, migraines, eclampsia and other disorders. All of these latter tasks will benefit from an accurate and robust fiber analysis technique.

However, accurate and efficient visualization of this large and complicated 3D fiber tract data set to gain clinical insight is extremely difficult. To tackle this problem, many fiber clustering methods have been proposed recently, e.g., the tracts have been successfully clustered into bundles using spectral clustering [2], normalized cuts [3], agglomerative hierarchical clustering [4], k-nearest neighbors [5]. For fiber representation, typical methods represent a fiber using a set of points [6, 5], shapes [7, 8], splines [9], geometric characteristics of fibers (e.g. length, center of mass, and second order moment) [10] and so on [11, 12]. For fiber comparison, the difference among fibers is quantified using different metrics including closest point distance, chamfer distance, Hausdorff distance, and their symmetric versions [8, 9, 13, 6].

In this paper, we propose a novel method for fiber clustering. The novelty lies in three parts. First, we present an easy to access fiber representation method. Second, we present a robust and efficient divergence measure to measure the dissimilarity between fibers. Finally, we present a hierarchical clustering framework which is not only robust, but also can automatically determine the number of clusters. To be specific, we use Gaussian mixture model (GMM) to represent each fiber. This GMM is calculated based on the points lying on the fiber. Note that earlier work in literature [14] reported a Gaussian process representation but not a GMM representation of the fibers. After getting the GMM for each fiber, the dissimilarity between two fibers is computed using the total square loss (tSL) between their corresponding GMMs. tSL belongs to the class of total Bregman divergences (tBD), which was proposed recently [15]. tBD has been proved to be a statistically robust divergence measure. This divergence measure is based on the orthogonal distance between the convex generating function of the divergence and its tangent approximation at the second argument of the divergence. For more details, we refer readers to [15]. Finally, the tBD hierarchical soft clustering algorithm is performed on GMMs. The clustering yields a grouping of the fiber bundles. The obvious benefit of using this clustering algorithm is that it is very efficient because the cluster center for a set of GMMs is in closed form. Furthermore, the number of clusters can be automatically detected. The entire framework is explained at length in Section 2.

Algorithm 1.

Total Bregman Soft Clustering Algorithm

Input: The data to be clustered

X = {x_{i}}_{i = 1}^{N}

, and the number of clusters c.

Output: M = {m_j}_j₌₁ and Q = {q_j}_j₌₁, m_j is the cluster center for the j^th cluster with probability q_j.

Initialization: Randomly choose c elements from X as M and set Q to the uniform probability.

repeat

{assign x_i to clusters}

for i = 1 to N do

for j = 1 to c do

q (j ∣ x_{i}) \leftarrow \frac{q_{j} exp (- d (m_{j}, x_{i}))}{\sum_{j^{'} = 1}^{c} q_{j^{'}} exp (- d (m_{j^{'}}, x_{i}))}

end for

{update cluster centers}

for j = 1 to c do

q_{j} \leftarrow \frac{1}{N} \sum_{i = 1}^{N} q (j ∣ x_{i})

m_j ← t-center for cluster j

end for

until The change of the results between two consecutive iterations is below some threshold.

Open in a new tab

The rest of the paper is organized as follows. In Section 2, we introduce our proposed method, followed by the empirical validation in Section 3. Finally we conclude in Section 4.

2. PROPOSED METHOD

In this part, we will first introduce the GMM representation method for fibers, and then we will present the tBD soft clustering algorithm for fiber clustering. Finally, we describe how our method can automatically determine the number of clusters.

2.1. Fiber representation

Suppose the set of points on a fiber Inline graphic are denoted by ${f_{i}}_{i = 1}^{∣ F ∣}$ , where | | denotes the number of points on the fiber . A fiber is characterized by a blurred indicator function (f), which is modeled by a Gaussian process as in [14], and

I_{F} (f) \sim N (I_{F}^{*} (f), σ_{F} (f)) .

(1)

The maximal level-set of this indicator function correspofignds to the fiber Inline graphic [14]. In this paper, we represent the fiber as a GMM, which is consistent with the above definitions,

p_{F} = \frac{1}{∣ F ∣} \sum_{i = 1}^{∣ F ∣} N (I_{F}^{*} (f_{i}), σ_{F} (f_{i})) .

(2)

The dissimilarity between two fibers Inline graphic and is computed using the total square loss (tSL) defined in [15] and reproduced below for convenience. Given two GMMs p₁ and p₂,

p_{1} (x) = \sum_{i = 1}^{m} a_{i}^{(1)} N (x; μ_{i}^{(1)}, \sum_{i}^{(1)}),

(3)

p_{2} (x) = \sum_{i = 1}^{m} a_{i}^{(2)} N (x; μ_{i}^{(2)}, \sum_{i}^{(2)}),

(4)

where x ∈ Rⁿ, m is the number of Gaussian components, and a_i is the coefficient for the ith component. The tSL between p₁ and p₂ is

d (p_{1}, p_{2}) = \frac{\int {(p_{1} - p_{2})}^{2} d x}{\sqrt{1 + \int 4 p_{2}^{2} d x}} = \frac{d_{1} + d_{2} - d_{1, 2}}{\sqrt{1 + 4 d_{2}}},

(5)

where

d_{1} = \sum_{i, j = 1}^{m} a_{i}^{(1)} a_{j}^{(1)} N (0; μ_{i}^{(1)} - μ_{j}^{(1)}, \sum_{i}^{(1)} + \sum_{j}^{(1)}),

(6)

d_{2} = \sum_{i, j = 1}^{m} a_{i}^{(2)} a_{j}^{(2)} N (0; μ_{i}^{(2)} - μ_{j}^{(2)}, \sum_{i}^{(2)} + \sum_{j}^{(2)}),

(7)

d_{1, 2} = 2 \sum_{i, j = 1}^{m} a_{i}^{(1)} a_{j}^{(2)} N (0; μ_{i}^{(1)} - μ_{j}^{(2)}, \sum_{i}^{(1)} + \sum_{j}^{(2)}),

(8)

N (0; μ, \sum) = \frac{1}{{(\sqrt{2 π})}^{n} \sqrt{det (\sum)}} exp (- \frac{μ^{'} \sum^{- 1} μ}{2}) .

(9)

The tSL between two GMMs only depends on the means and covariance matrices of the Gaussian components, and is very efficient to compute.

Given a set of fibers and their corresponding GMMs, their center (average or representative) is defined through the following minimization,

\bar{p} = arg min_{p} \sum_{l = 1}^{n} d (p, p_{l}) .

(10)

The solution to this minimization can be written in closed form as,

\bar{p} = \frac{\sum_{l = 1}^{n} w_{l} p_{l}}{\sum_{l = 1}^{n} w_{l}}, w_{l} = {(1 + 4 d_{l})}^{- 1 / 2} .

(11)

This closed form solution leads to fast computations in our clustering algorithm described below.

2.2. Hierarchical soft clustering

A hierarchical divisive clustering process is performed using the tBD soft clustering algorithm proposed in [16], which is presented here in the Algorithm block 1 for readers’ convenience.

Let l denote the level of clustering. The original (entire) dataset is at level l = 0. We first split the whole dataset into two clusters, which are at level l = 1. Then from the level l = 1 clusters, we choose the cluster with largest maximum divergence from the cluster center, and split this cluster into two clusters. These two clusters along with the level l = 1 clusters, except the split cluster, will form the clusters at the l = l + 1 level. The above process is repeated on the new l level clusters.

2.3. Choosing the Number of Clusters

We define the effort “e” of splitting a cluster C into A and B as

e (C) = \frac{2}{∣ C ∣} \sum_{i \in C} d ({\bar{p}}_{C}, p_{i}) - \frac{1}{∣ A ∣} \sum_{i \in A} d ({\bar{p}}_{A}, p_{i}) - \frac{1}{∣ B ∣} \sum_{i \in B} d ({\bar{p}}_{B}, p_{i}),

(12)

where p̄_C, p̄_A and p̄_B are the centers of the sets C, A and B as defined in (11). |A| represents the number of fibers in A. Note that the higher the value of this effort, the better the splitting, since the effort is the amount of reduction in the average dissimilarity between the center and the members of the set in the original cluster C.

The number of clusters is determined in the following way. First, we perform hierarchical clustering for a known number of times, K. Compute the effort ${e_{i}}_{i = 1}^{K}$ . Then for each k ∈ {2, · · ·, K}, we use total least squares fitting to fit a linear function to the computed efforts ${(i, e_{i})}_{i = 1}^{k}$ and ${(i, e_{i})}_{i = k + 1}^{K}$ respectively. Let’s denote the total fitting error as ε_k. Then the number of optimal clusters will be k^* = arg min_k ε_k, k = 1, · · ·, K. This is illustrated in Fig. 1. In this paper, we stop the splitting process when the number of clusters reaches $\sqrt{N}$ , i.e., K = N. This number was chosen heuristically.

Fig. 1 — Determine the number of clusters according to total least squares fitting. The x-axis represents the number of clusters, while the y-axis represents the clustering effort “e”. The two blues lines are fitting lines. The red circle represents the determined optimal number of clusters.

3. EXPERIMENTAL RESULTS

We evaluated our method on both synthetic and real datasets. We first generate a 160 × 160 × 160 diffusion tensor field and the principle eigenvectors of the DTI are shown in Fig. 2(a). This data was generated such that the eigenvectors of the diffusion tensors group into 5 different fiber clusters which provides the ground truth for our clustering method. The tractography was performed using 3D Slicer, and the streamline tractography was implemented by seeding in sub-voxel resolution by taking every voxel with fractional anisotropy higher than 0.1, with the stopping track curvature 0.8, and integration step length 0.5mm. The minimum length is 10mm. In this way, we got 400 fibers which are shown in Fig. 2(b). There are about 50 points on each fiber. The clustering results are shown in Fig. 2(d) and the final cluster centers are shown in Fig. 2(c) in bold. The clustering results yield precisely 5 clusters almost the same as the ground truth, thus validating the accuracy of our proposed clustering algorithm.

Clustering on synthetic data. The principle eigenvectors are shown in (a), and the fibers are shown in (b). The final clustering fiber bundles along with the final centers are shown in Fig. 2(c)(d).

To evaluate our method on the real dataset, we used the human brain DWI dataset (93 × 116 × 93) provided by Alfred Anwander of the Max Planck Institute for Human Neuro-science [17]. The DWIs were acquired with a whole-body 3 Tesla Magnetom TRIO operating at 3T (Siemens Medical Solution) equipped with an 8-channel head array coil. The twice-refocused spin-echo EPI sequence (TR = 12 s, TE = 100 ms) consists of 60 diffusion gradients with a b-value of 1000s/mm². The DTI image was computed using non-local means variational framework [18]. The tractography was performed using 3D Slicer. The tractography was Labelmap Seeding, which was implemented by seeding in sub-voxel resolution (2mm) by taking every voxel with fractional anisotropy higher than 0.1, with the stopping track curvature 0.8, integration step length 0.5mm. The minimum length is 10mm, and maximum length is 800mm. In total, 611 fiber tracts were obtained. The fibers are shown in Fig. 3(a). The initial centers are shown in Fig. 3(b). The final clustering fiber bundles along with the final centers are shown in Fig. 3(c)(d).

The fibers from the tractography process are shown in Fig. 3(a). The initial centers are shown in Fig. 3(b). The final clustering fiber bundles along with the final centers are shown in Fig. 3(c)(d).

4. CONCLUSIONS

In this paper, we proposed an unsupervised fiber clustering framework that can automatically determine the number of clusters. We use a Gaussian mixture model (GMM) to represent a fiber, and the dissimilarity between two fibers is defined using the total square loss between the corresponding GMMs. We use total Bregman soft clustering algorithm to cluster the fibers into different clusters, and each cluster corresponds to a fiber bundle. The number of clusters is determined according to the total least squares fitting error in the clustering effort defined earlier in (12). The proposed method has been evaluated favorably on both synthetic and real datasets.

Acknowledgments

This research was in part funded by the NIH grant NS066340 to Baba C. Vemuri, the University of Florida Alumni Fellowship and the INRIA Intern-ship to Meizhu Liu, and the Association France Parkinson and the French National Research Agency (ANR, Neurodegenerative and Psychiatric Diseases) to Rachid Deriche.

References

1.Basser PJ, Mattiello J, LeBihan D. Estimation of the effective self-diffusion tensor from the NMR spin echo. J Magn Res. 1994;103(3):247–254. doi: 10.1006/jmrb.1994.1037. [DOI] [PubMed] [Google Scholar]
2.O’Donnell L, Westin CF. White matter tract clustering and correspondence in populations. MICCAI. 2005:3570–3577. [PubMed] [Google Scholar]
3.Brun A, Knutsson H, Park HJ, Shenton ME, Westin CF. Clustering fiber traces using normalized cuts. MIC-CAI. 2004:368–375. doi: 10.1007/b100265. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Zhang S, Laidlaw DH. DTI fiber clustering in the whole brain. IEEE Visualization. 2004:28. [Google Scholar]
5.Ding Z, Gore JC, Anderson AW. Classification and quantification of neuronal fiber pathways using diffusion tensor MRI. MRM. 2003;49:716–721. doi: 10.1002/mrm.10415. [DOI] [PubMed] [Google Scholar]
6.Maddah M, Grimson WEL, Warfield SK, Wells WM. A unified framework for clustering and quantitative analysis of white matter fiber tracts. MedIA. 2008;12:191–202. doi: 10.1016/j.media.2007.10.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Calamante PG, Tournier F, Atkinson JD, Hill D, Connelly DLG, Batchelor A. Quantification of the shape of fiber tracts. MRM. 2006;55(4):894–903. doi: 10.1002/mrm.20858. [DOI] [PubMed] [Google Scholar]
8.Corouge I, Fletcher PT, Joshic S, Gouttardd S, Geriga G. Fiber tract-oriented statistics for quantitative diffusion tensor MRI analysis. MedIA. 2005:786–798. doi: 10.1016/j.media.2006.07.003. [DOI] [PubMed] [Google Scholar]
9.Glaunes J, Qiu A, Miller M, Younes L. Large deformation diffeomorphic metric curve mapping. IJCV. 2008;80(3):317–336. doi: 10.1007/s11263-008-0141-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Corouge I, Gouttard S, Gerig G. Towards a shape model of white matter fiber bundles using diffusion tensor mri. ISBI. 2004:344–347. [Google Scholar]
11.Li H, Xue Z, Guo L, Liu T, Hunter J, Wong STC. A hybrid approach to automatic clustering of white matter fibers. NeuroImage. 2010;49:1249–1258. doi: 10.1016/j.neuroimage.2009.08.017. [DOI] [PubMed] [Google Scholar]
12.Wang Q, Yap P, Wu G, Shen D. Fiber modeling and clustering based on neuroanatomical features. MICCAI. 2011 doi: 10.1007/978-3-642-23629-7_3. [DOI] [PubMed] [Google Scholar]
13.Tsai A, Westin CF, Hero AO, Willsky AS. Fiber Tract Clustering on Manifolds With Dual Rooted-Graphs. IEEE CVPR. 2007;51:1–6. [Google Scholar]
14.Wassermann D, Bloy L, Kanterakis E, Verma R, Deriche R. Unsupervised white matter fiber clustering and tract probability map generation: Applications of a Gaussian process framework for white matter fibers. NeuroImage. 2010;51:228–241. doi: 10.1016/j.neuroimage.2010.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Vemuri BC, Liu M, Amari S, Nielsen F. Total Bregman Divergence and its Applications to DTI Analysis. IEEE TMI. 2011;30(2):475–483. doi: 10.1109/TMI.2010.2086464. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Liu M, Vemuri BC, Amari S, Nielsen F. Shape Retrieval Using Hierarchical Total Bregman Soft Clustering. TPAMI. 2012 doi: 10.1109/TPAMI.2012.44. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Makuuchi M, Bahlmann J, Anwander A, Friederici AD. Segregating the core computational faculty of human language from working memory. Proc National Acad Sciences. 2009;106(20):8362–8367. doi: 10.1073/pnas.0810928106. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Liu M, Vemuri BC, Deriche R. Simultaneous Smoothing & Estimation of DTI via Robust Variational Non-local Means. MICCAI CDMRI. 2011 doi: 10.1016/j.neuroimage.2012.11.012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] 1.Basser PJ, Mattiello J, LeBihan D. Estimation of the effective self-diffusion tensor from the NMR spin echo. J Magn Res. 1994;103(3):247–254. doi: 10.1006/jmrb.1994.1037. [DOI] [PubMed] [Google Scholar]

[R2] 2.O’Donnell L, Westin CF. White matter tract clustering and correspondence in populations. MICCAI. 2005:3570–3577. [PubMed] [Google Scholar]

[R3] 3.Brun A, Knutsson H, Park HJ, Shenton ME, Westin CF. Clustering fiber traces using normalized cuts. MIC-CAI. 2004:368–375. doi: 10.1007/b100265. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Zhang S, Laidlaw DH. DTI fiber clustering in the whole brain. IEEE Visualization. 2004:28. [Google Scholar]

[R5] 5.Ding Z, Gore JC, Anderson AW. Classification and quantification of neuronal fiber pathways using diffusion tensor MRI. MRM. 2003;49:716–721. doi: 10.1002/mrm.10415. [DOI] [PubMed] [Google Scholar]

[R6] 6.Maddah M, Grimson WEL, Warfield SK, Wells WM. A unified framework for clustering and quantitative analysis of white matter fiber tracts. MedIA. 2008;12:191–202. doi: 10.1016/j.media.2007.10.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Calamante PG, Tournier F, Atkinson JD, Hill D, Connelly DLG, Batchelor A. Quantification of the shape of fiber tracts. MRM. 2006;55(4):894–903. doi: 10.1002/mrm.20858. [DOI] [PubMed] [Google Scholar]

[R8] 8.Corouge I, Fletcher PT, Joshic S, Gouttardd S, Geriga G. Fiber tract-oriented statistics for quantitative diffusion tensor MRI analysis. MedIA. 2005:786–798. doi: 10.1016/j.media.2006.07.003. [DOI] [PubMed] [Google Scholar]

[R9] 9.Glaunes J, Qiu A, Miller M, Younes L. Large deformation diffeomorphic metric curve mapping. IJCV. 2008;80(3):317–336. doi: 10.1007/s11263-008-0141-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Corouge I, Gouttard S, Gerig G. Towards a shape model of white matter fiber bundles using diffusion tensor mri. ISBI. 2004:344–347. [Google Scholar]

[R11] 11.Li H, Xue Z, Guo L, Liu T, Hunter J, Wong STC. A hybrid approach to automatic clustering of white matter fibers. NeuroImage. 2010;49:1249–1258. doi: 10.1016/j.neuroimage.2009.08.017. [DOI] [PubMed] [Google Scholar]

[R12] 12.Wang Q, Yap P, Wu G, Shen D. Fiber modeling and clustering based on neuroanatomical features. MICCAI. 2011 doi: 10.1007/978-3-642-23629-7_3. [DOI] [PubMed] [Google Scholar]

[R13] 13.Tsai A, Westin CF, Hero AO, Willsky AS. Fiber Tract Clustering on Manifolds With Dual Rooted-Graphs. IEEE CVPR. 2007;51:1–6. [Google Scholar]

[R14] 14.Wassermann D, Bloy L, Kanterakis E, Verma R, Deriche R. Unsupervised white matter fiber clustering and tract probability map generation: Applications of a Gaussian process framework for white matter fibers. NeuroImage. 2010;51:228–241. doi: 10.1016/j.neuroimage.2010.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Vemuri BC, Liu M, Amari S, Nielsen F. Total Bregman Divergence and its Applications to DTI Analysis. IEEE TMI. 2011;30(2):475–483. doi: 10.1109/TMI.2010.2086464. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Liu M, Vemuri BC, Amari S, Nielsen F. Shape Retrieval Using Hierarchical Total Bregman Soft Clustering. TPAMI. 2012 doi: 10.1109/TPAMI.2012.44. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Makuuchi M, Bahlmann J, Anwander A, Friederici AD. Segregating the core computational faculty of human language from working memory. Proc National Acad Sciences. 2009;106(20):8362–8367. doi: 10.1073/pnas.0810928106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Liu M, Vemuri BC, Deriche R. Simultaneous Smoothing & Estimation of DTI via Robust Variational Non-local Means. MICCAI CDMRI. 2011 doi: 10.1016/j.neuroimage.2012.11.012. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

UNSUPERVISED AUTOMATIC WHITE MATTER FIBER CLUSTERING USING A GAUSSIAN MIXTURE MODEL

Meizhu Liu

Baba C Vemuri

Rachid Deriche

Abstract