Abstract
Diffusion tensor imaging cannot represent heterogeneous fascicle orientations in one voxel. Various models propose to overcome this limitation. Among them, multi-fascicle models are of great interest to characterize and compare white matter properties. However, existing methods fail to estimate their parameters from conventional diffusion sequences with the desired accuracy. In this paper, we provide a geometric explanation to this problem. We demonstrate that there is a manifold of indistinguishable multi-fascicle models for single-shell data, and that the manifolds for different b-values intersect tangentially at the true underlying model making the estimation very sensitive to noise. To regularize it, we propose to learn a prior over the model parameters from data acquired at several b-values in an external population of subjects. We show that this population-informed prior enables for the first time accurate estimation of multi-fascicle models from single-shell data as commonly acquired in clinical context. The approach is validated on synthetic and in vivo data of healthy subjects and patients with autism. We apply it in population studies of the white matter microstructure in autism spectrum disorder. This approach enables novel investigations from large existing DWI datasets in normal development and in disease.
Keywords: Diffusion, Single-Shell, Generative Models, Estimation
1 Introduction
Diffusion tensor imaging is unable to represent the signal arising from crossing fascicles. Various approaches have been proposed to overcome this limitation. Among them, generative models such as multi-tensor models [2,3] seek to represent the signal contribution from different populations of water molecules. Based on biological modelling, they are of great interest to characterize and compare white-matter properties. However, estimating their parameters from conventional diffusion data has proven inefficient.
Recent works have suggested that part of this inaccuracy is explained by the ill-posedness of the problem and not only by the imaging nuisance [3,4]. To regularize the estimation of models with a single anisotropic tensor, elaborate spatial priors have been proposed [2], and it was shown that acquiring additional b-values improves the analysis of isotropic fraction [1]. For N-tensors, it was proposed to fix the tensor eigenvalues [4], solving the ill-posedness problem but reducing the amount of microstructural information contained in the model. No method has proposed to regularize the estimation of an N-fascicle model while keeping all its degrees of freedom. Furthermore, there is a strong need for a strategy to estimate multi-fascicle models from conventional single-shell data due to their wide availability in clinical setting. Section 2 analyzes the estimation problem from a geometric point of view. Section 3 develops an estimator based on a prior informed by an external population of subjects. Section 4 presents results and Section 5 concludes. Conclusions about estimating an N-tensor model can be applied to all generative models that include a multi-tensor as part thereof.
2 Manifolds of Equivalent Models at a Given B-value
A multi-fascicle model is represented as a mixture of single fascicle models. In the multi-tensor formalism, the generative model for the formation of the diffusion signal S for a b-value b and a gradient direction g is:
(1) |
where Di and fi are the tensor and the volumetric fraction of fascicle i. Since γie−log γi=1, all multi-fascicle models with fractions γifi and tensors produce the same signal:
(2) |
The tensors remain positive definite as long as , where is the lowest eigenvalue of Di. Each of these models is uniquely identified by its vector ( ). The set of all models respecting Equation (2) is a manifold of dimension (N − 1) defined by the implicit equations (we let ):
(3) |
where ( ) is the true unknown model (Fig. 1(a)). Since these equations depend on b, so will the manifold. Acquiring diffusion images at different b-values amounts to defining different such manifolds. Let us investigate how those manifolds intersect at the point of interest . The explicit equation of the hypersurface λN (λ1, …, λN−1) obtained by eliminating the γ’s between equations (3) is:
(4) |
The normal vector to the hypersurface is . Its k-th component evaluated at the true model is:
(5) |
Remarkably, this normal vector (and thereby the tangent hyperplane) does not depend on b at the point of interest. In other words, at the first-order approximation, the manifolds at all b-values coincide locally, explaining the high sensitivity to noise encountered when optimizing the parameters of a multi-fascicle model (Fig. 1(b)).
At the second-order approximation, the manifold is characterized by the Hessian matrix of λN (λ1, …, λN−1):
where f̃ = [f1, …, fN−1]T. The difference between the Hessian matrices at two different b-values, b and b′ > b, is positive definite since, for all x ≠ 0, we have
(6) |
Therefore, there exists no direction x along which the two manifolds have the same curvature. Consequently, the true model is locally the only intersection of all manifolds. Given the difference (6), it appears that a wider range of b-values leads to a larger difference between their manifolds, which should in turn improve the accuracy of the estimation (ignoring the potential impact of b on noise).
When an isotropic compartment fisoe−bDiso is added to the model, one can show that the above development remains valid with an unchanged N if Diso is known and considering an (N + 1)-fascicle model if Diso needs also be optimized.
3 Posterior Predictive Distribution of the Parameters
While all models of (3) are equally compatible with the observed DWI at a given b-value, they are not all as likely from a biological point of view. This knowledge can be learnt from available observations at several b-values of a fascicle i in mi subjects , and incorporated in the estimation as a prior over the parameters (fi, Di) (Fig. 1(c)). If the effect of the fascicle properties on partial voluming is negligible, and if the properties of one fascicle are independent of those of another, then the prior can be expressed as:
(7) |
The fractions are not independent since they sum to 1. However, we assume that any fraction fi is independent of the relative proportions of others fj/(1 − fi). This neutral vector assumption naturally leads to the Dirichlet distribution:
(8) |
To prevent negative eigenvalues of the tensors, the prior knowledge about Di can be described as a multivariate Gaussian distribution over their logarithm [5]:
(9) |
In general, Σi has 21 free parameters, which may overfit the usually small training dataset. For DTI, it is suggested in [5] to constrain Σi to be orthogonally invariant, imposing the following structure that depends only on σi and ιi:
This structure yields a closed-form solution for the maximum likelihood [5]:
(10) |
(11) |
where is defined by 〈A, B〉t=Tr(AB)−t Tr(A)Tr(B). The ML estimator may be unreliable for compartments with only a few observations. This uncertainty is accounted for by replacing point estimates of θ by posterior distributions and integrating over all possible θ. This yields the posterior predictive distribution (PPD) which contains all the knowledge about new observations that we learn from previous observations. Its derivation requires the definition of hyperpriors over θ and is closed-form if we select conjugate hyperpriors. Mi ~ (M0, Λ0) is a conjugate hyperprior for the tensor part of (7) assuming a deterministic Σi = Σ̂ i. We set Λ0=B(1, 0) and M0=log Diso to keep it weakly informative (this hyperprior merely encodes the order of magnitude of diffusivity at 37°C). The PPD over the tensors is with
(12) |
(13) |
For the parameters αi, a conjugate hyperprior is the Dirichlet distribution. We set all its parameters to 1, making it uniform over the simplex . The resulting PPD is a Dirichlet with parameters . In this expression, we consider as frequency counts since they are samples of fi rather than samples from a multinomial parameterized by fi. The complete PPD is (with constant):
(14) |
We incorporate this PPD as a prior in the estimation. We assume Gaussian noise on the DWI measurements yk since they are acquired on a single shell typically at b=1000 for which noise is approximately Gaussian. The maximum a posteriori estimator at each voxel amounts to maximizing the following for f and D:
(15) |
The influence of the noise is analyzed in the next section. In practice, the prior is built from data acquired in completely different subjects at several b-values. All these subjects are registered to a multi-fascicle atlas as in [7]. Following alignment, tensors from all subjects at each voxel are clustered in N compartments as in [6]. Each cluster represents the sets and of available observations. The prior is then aligned with an initial estimate (without prior) of the multi-fascicle model. To evaluate (14), all assignments of compartments to tensors are considered and the highest prior value is recorded. BOBYQA algorithm is used to maximize (15) and the number of fascicles is estimated by an F-test as in [3].
4 Results
We compare the models estimated by our method to a ground truth {gi, Gi} with five root mean square metrics, ΔFA, ΔMD, Fro, ΔF and Δiso defined by:
Synthetic Phantom Experiment
DWI were simulated under Rician noise from a phantom containing an isotropic compartment and 0 to 3 tensors of various properties with S0=400. The prior was built from 20 datasets of 90 DWI at b=1000, 2000 and 3000. The accuracy was evaluated with 20 datasets of 30 DWI at b=1000 in three scenarios. First, the noise variance increased from 40 to 120. Second, the FA of the phantom was offset by −10% to +10% without changing the prior to simulate patient’s data with a prior built from healthy subjects. Third, random deformations of 0 to 2 voxels were applied to the prior to simulate registration errors. In the last two scenarios, the noise variance was 80. In all scenarios and for all metrics, incorporating the prior significantly improved the accuracy of the estimation (one-tail paired t-test: p < 10−6) (Fig. 2)
In Vivo Data Experiment
Eighteen healthy subjects and 10 subjects with autism were imaged to test the method and an extra 13 healthy subjects were imaged to build the prior. For all subjects, DWI at resolution 1.7 × 1.7 × 2mm3 were acquired with a Siemens 3T Trio with a 32 channel head coil using the CUSP-45 sequence [3]. This includes 30 gradients on a single-shell at b = 1000 and 15 gradients with b-values up to 3000. For each test subject, all 45 DWI were first used to estimate a multi-fascicle model considered as a ground truth. Estimations using the single-shell subset only were then compared to it. Four strategies were compared: estimation without prior, estimation by fixing all tensors to a globally optimized value, and estimation with the prior assuming a noise level of 20 and 500. Results in Fig. 3 show that estimations which incorporate a prior outperform other strategies. Estimations with are significantly better than estimations without prior for all metrics and, remarkably, for both healthy controls and ASD patients (one-tail paired t-test: p < 10−6). The true noise level of DWI is arguably closer to 500 than 20. However, estimations with remain more accurate than estimations without prior, indicating that the population-informed prior improves the model accuracy even for crude estimates of the noise level. Empirical estimates of this noise level is kept for future work. Finally, fixing the fascicle response results in accuracies that strongly vary among quality metrics and, furthermore, only provides average information about the brain microstructure, which is not suitable in most studies.
Application to Population Studies
One could be concerned that the improved accuracy brought by the prior would come with a severe shrinkage of the estimated parameters towards the mean of the population. This would prevent its use in population studies. To address this concern, we conducted two population studies of autism spectrum disorder (ASD) using the proposed estimator. The first one focused on fascicle properties in the left arcuate fasciculus by analyzing the FA along the median tract. The second study investigated whether an increased extracellular volume fraction fiso is observed in ASD. Less restricted diffusion may be related to the presence of edema, thinner axons, and neuroinflammation [1]. The latter has been proposed as a possible cause of autism. Corrections for multiple comparisons were based on cluster-size statistics in 1000 permutations with a threshold on t-scores of 3. As presented in Fig. 4, the first study revealed decreased FA integrity in the arcuate fasciculus of patients with ASD, in line with most recent studies of autism. The second study revealed one clusters of significantly increased unrestricted diffusion (permutation test: p < 0.003). Without the prior, none of these findings were observed (p > 0.1). These studies show that the use of a prior in the estimation preserves contrasts of diffusion properties between groups, so that single-shell HARDI data can be used in large population studies based on multi-fascicle models.
5 Conclusion
Multi-fascicle models cannot be estimated from conventional single-shell HARDI data because a manifold of models produce the same diffusion signals. However, we showed that a posterior predictive distribution over the model parameters can be learnt from data acquired at several b-values in an external population. By incorporating this population-informed prior in the maximum a posteriori estimator of the parameters, we are able to estimate accurate multi-fascicle models from data at a single b-value. This method thus opens new opportunities for population studies with the large number of available clinical diffusion images.
Acknowledgments
This work was supported in part by NIH grants 1U01NS082320, R01 EB008015, R03 EB008680, R01 LM010033, R01 EB013248, P30 HD018655, BCH TRP, R42 MH086984 and UL1 TR000170.
Footnotes
MT and NB are research fellows of the F.R.S.-FNRS. MT is also research fellow of the B.A.E.F.
References
- 1.Pasternak O, Shenton ME, Westin C-F. Estimation of extracellular volume from regularized multi-shell diffusion MRI. In: Ayache N, Delingette H, Golland P, Mori K, editors. MICCAI 2012, Part II. LNCS. Vol. 7511. Springer; Heidelberg: 2012. pp. 305–312. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Pasternak O, Sochen N, Gur Y, Intrator N, Assaf Y. Free water elimination and mapping from diffusion MRI. Magnet Reson Med. 2009;62(3):717–730. doi: 10.1002/mrm.22055. [DOI] [PubMed] [Google Scholar]
- 3.Scherrer B, Warfield SK. Parametric representation of multiple white matter fascicles from cube and sphere diffusion MRI. PLoS one. 2012;7(11):e48232. doi: 10.1371/journal.pone.0048232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Schultz T, Westin C-F, Kindlmann G. Multi-diffusion-tensor fitting via spherical deconvolution: a unifying framework. In: Jiang T, Navab N, Pluim JPW, Viergever MA, editors. MICCAI 2010, Part I. LNCS. Vol. 6361. Springer; Heidelberg: 2010. pp. 674–681. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Schwartzman A, Mascarenhas WF, Taylor JE. Inference for eigenvalues and eigenvectors of gaussian symmetric matrices. Ann Stat. 2008:2886–2919. [Google Scholar]
- 6.Taquet M, Scherrer B, Benjamin C, Prabhu S, Macq B, Warfield S. Interpolating multi-fiber models by gaussian mixture simplification. IEEE ISBI. 2012:928–931. [Google Scholar]
- 7.Taquet M, Scherrer B, Commowick O, Peters J, Sahin M, Macq B, Warfield SK. Registration and analysis of white matter group differences with a multi-fiber model. In: Ayache N, Delingette H, Golland P, Mori K, editors. MICCAI 2012, Part III. LNCS. Vol. 7512. Springer; Heidelberg: 2012. pp. 313–320. [DOI] [PMC free article] [PubMed] [Google Scholar]