Abstract
In this work we propose a novel Gaussian process-based spatio-temporal model of time series of images. By assuming separability of spatial and temporal processes we provide a very efficient and robust formulation for the marginal likelihood computation and the posterior prediction. The model adaptively accounts for local spatial correlations of the data, and the covariance structure is effectively parameterised by the Kronecker product of covariance matrices of very small size, each encoding only a single direction in space. We provide a simple and flexible framework for within- and between-subject modelling and prediction. In particular, we introduce the Hoffman-Ribak method for efficient inference on posterior processes and its uncertainty. The proposed framework is applied in the context of longitudinal modelling in Alzheimer’s disease. We firstly demonstrate the advantage of our non-parametric method for modelling of within-subject structural changes.The results show that non-parametric methods demonstrably outperform conventional parametric methods. Then the framework is extended to optimize complex parametrized covariate kernels. Using Bayesian model comparison via marginal likelihood the framework enables to compare different hypotheses about individual change processes of images.
1. Introduction
Modelling longitudinal changes in organs is fundamental for the understanding of biological and pathological processes. For instance the development of a spatio-temporal model of disease progression in Alzheimer’s disease (AD) from time series of magnetic resonance images (MRIs) would be highly valuable for the fundamental understanding of the disease process, for diagnostic purposes and individual predictions, and for testing the efficacy of disease modifying drugs in clinical trials.
The consistent modelling and prediction of spatio-temporal changes in longitudinal MRI is still an important challenge from both methodological and computational perspectives. In fact, flexible modelling instruments are required in order to robustly capture meaningful pathological accelerations specific to sensitive brain regions. Moreover, since a biological model of local brain changes is often unknown, it is important to develop optimal models in terms of statistical complexity.
Many of the previous works on spatio-temporal modelling of image time series are based on non-linear image registration, describing signal differences between images as local spatial transformations [1,2,3,4]. However, statistical inference in registration models is often limited, due to the computational complexity, and since image-registration is generally not flexible enough to perform model comparisons and clinical prediction, to account for covariates and for the within- and between subjects heterogeneity.
A statistical focus on the modeling of image time series is commonly provided by parametric linear modelling frameworks (GLM) [5]. However, GLM approaches are often limited by the choice of arbitrary model complexity and spatial resolution at which the data is analyzed. Even though flexible non-parametric models have been proposed for the analysis of spatio-temporal signals in brain images [6,7], their computational complexity still prevents the straightforward application in time series of high-resolution MRIs. Non-parametric Gaussian process (GP) models have emerged as a flexible and elegant Bayesian approach for prediction and modelling in manifold applications [8], and have been recently successfully introduced to the field of neuroimaging, e.g. in the context of single-case inference in aging [9]. However, the application of GPs to the voxel-wise modelling of image time series is to date very challenging, since the specification of the joint covariance structure of the image features is in general computationally prohibitive.
In this work we introduce a generative model of spatio-temporal changes based on GPs, to provide a flexible and computationally efficient approach to the analysis of aligned image time series by accounting for spatial and temporal correlation. In particular, by assuming a local spatial correlation model and the separability between spatial and temporal changes, we introduce a very efficient formulation based on a covariance structure parameterized by the Kronecker product of small size covariance matrices [10]. The proposed model extends GLM approaches by providing a flexible and efficient statistical tool for the analysis of image features from spatially aligned time series, for instance by allowing statistical inference on the model parameters.
The paper is organized as follows. In Section 2 we propose our generative model of longitudinal changes in image time series, while in Section 3 we provide computationally tractable optimization and prediction schemes. We also introduce a novel computational scheme based on the Hoffman-Ribak method for the statistical inference in high dimensional GP-based spatio-temporal models. Finally, in Sections 4 and 5, we apply the model in the context of longitudinal data from from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) for 1) within-subject modelling and prediction of local and regional brain longitudinal changes, and 2) group-wise joint modelling of local ventricle growth rates based on socio-demographics, genetic factors, and clinical scores.
2. A generative model for within-subject image time series
Let u = (x, y, z) be the 3-dimensional spatial coordinate system and t the temporal dimension. We consider the image time series I as a discretely sampled spatio-temporal signal of dimensions N × N × N × NT, where N is the dimension of the sampling grid on a single spatial axis, and NT is the number of time points3. In the following sections we represent the image time-series I as a single dimensional array of dimensions N3NT. We model the image time series I(u, t) as a realization of a latent spatio-temporal process f (u, t) with additive noise:
(1) |
The true signal will be modelled as a GP with zero mean and covariance Σ, while ϵ is assumed to be i.i.d. Gaussian distributed measurement noise ϵ(u, t) ~ 𝒩(0, σ2). Here we first assume that spatial and temporal processes are separable, and thus that the covariance matrix Σ can be factorised in the Kronecker product of independent spatial and temporal covariance matrices: Σ = ΣS ⊗ ΣT.
This is a valid modeling assumption when the temporal properties of the signal are similar across space; for instance, when analyzing within-subject time series of brain MRIs in AD the expected pathological change rates are generally mild and slowly varying across the brain. Second, a central assumption made in this paper is that the spatial dependencies of the signal are local, i.e. that the image intensities are smoothly varying and correlated within a spatial neighborhood of radius ls. We note that our assumptions about separability and stationarity are compatible with the spatio-temporal correlation models commonly assumed by registration-based approaches.
A reasonable choice for such a local spatial covariance structure is a negative squared exponential model where λs is the global spatial amplitude parameter, and ls is the length-scale of the Gaussian spatial neighborhood. We observe that such a covariance structure is stationary with respect to the space parameters. Furthermore we can exploit the separability properties of the negative exponential function to note that given two separate spatial locations u1 = (x1, y1, z1) and u2 = (x2, y2, z2) we have
For this reason the covariance matrix ΣS can be further decomposed as the Kronecker product of covariance matrices of 1-dimensional processes: Σ = Kx ⊗ Ky ⊗ Kz ⊗ ΣT. We observe that the model is here conveniently represented by the product of independent covariances of significantly smaller size, and is completely identified by the spatial, temporal and noise parameters. In particular the proposed model is flexible with respect to the temporal covariance matrix ΣT, which can be expressed in terms of complex mixed-effects structure, and can account for covariates and different progression models. For instance, in this work the matrix ΣT is first specified in order to model the temporal progression observed in time series of images (Section 4), and then is used to model the influence of anatomical, genetic, clinical, and sociodemographic covariates on individual atrophy rates modelled by non-linear registration (Section 5).
3. Inference in Gaussian processes with Kronecker structure
The GP-based generative model with Kronecker covariance structure outlined in this work provides a powerful and efficient framework for prediction using image time series. Here we provide the main results concerning the marginal likelihood computation, the hyper-parameter optimization and the posterior prediction.
Let be the eigenvectors and eigenvalues associated to the one-dimensional spatial and temporal covariance matrices Kx and ΣT. This eigendecomposition problem can be easily and efficiently solved beforehand offline. We further introduce the shortform notation ⊗ A = Ax ⊗ Ay ⊗ Az.
Log-Marginal Likelihood
The marginal likelihood of the model (1) is the following:
(2) |
with and where Ĩ is the matricization of I into a 2 dimensional matrix of dimension N2 × NNT, and are the eigenvalues of respectively Kx, Ky, Kz and Σt. The computation of the vector VI requires the storage and multiplication of matrices of relatively small sizes, respectively N2 × N2, N2 × NNT and NNT × NNT. The product can be finally computed as the solution of the linear system which is straightforward since is diagonal.
Hyperparameter optimization
The derivative of the log-likelihood (2) with respect to the model parameters θ is:
(3) |
It can be shown that formula (3) can be efficiently computed with respect to each model parameters. For instance, the gradient with respect to the noise parameter can be expressed in the form:
(4) |
Prediction
A major strength of a GP framework for image time series is that it easily enables probabilistic predictions based on given observations. The proposed generative model allows us to consider the predictive distributions of the latent spatio-temporal process at any testing locations u* and timepoints t*. Given image time series I(u, t), we now aim at predicting the image testing coordinates {u*, t*}. Let us define ΣI,I* = Σ(u, t, u*, t*) the cross-covariance matrix of training and testing data, and ΣI*, I* = Σ(u*, t*, u*, t*) the covariance evaluated on the new coordinates. The joint GP model of training and testing data is:
(5) |
and it can be easily shown that the posterior distribution of I* conditioned on the observed time series I and parameters θ is [8]:
(6) |
From the practical perspective, we notice that by definition the new covariance matrices still have a Kronecker product form: ΣI,I* = Kx,x* ⊗ Ky,y* ⊗ Kz,z* ⊗ Σt,t*, and ΣI*,I* = Kx*,x* ⊗ Ky*,y* ⊗ Kz*,z* ⊗ Σt*,t*. The predicted mean µ* at coordinates {u*, t*} is then
which can be computed efficiently by noting that the matrix to be inverted is diagonal and by using the product rule of the Kronecker operator. While the posterior form (6) can also be used to evaluate the posterior marginal covariance, certain considerations are necessary for a tractable approach. Indeed, the covariance matrix Σ* is computed from Σ, ΣI*,I* and ΣI*,I, which are evaluated on different sets of spatial and temporal coordinates. In particular, the Kronecker structure is lost and in the absence of further assumptions the matrix Σ* must therefore be explicitly computed, generally leading to impractical solutions.
Hoffman-Ribak method for posterior sampling
We propose to compute the sample distribution of (6) using the Hoffman-Ribak method (HR) introduced in the late 1990s in the astrophysics literature [11]. Given the Gaussian distribution (5) partitioned into training (observed) and testing (unobserved) components, the HR method provides a computationally efficient and exact algorithm for sampling from (6) consisting of the following two steps:
-
–
Sample a random observation (Y, Y*) from the joint distribution (5),
-
–
Compute a sample Z of the marginal posterior (6) according to Z = Y* + ΣI*,I (Σ + σ2Id)−1Y.
Despite its simple formulation, the HR method cannot be straightforwardly applied in our case as sampling from the very high dimensional joint distribution is generally prohibitive. Therefore, instead of focusing on predicting time series at arbitrary spatial and temporal coordinates, we provide here an efficient scheme for spatio-temporal prediction at arbitrary time points T* = {t*} evaluated in the same spatial coordinates of the training image time-series I. Under this assumption the matrices Σ, ΣI*,I* and ΣI*,I differ in the temporal part only,
and it is simple to show that the joint covariance is Σjoint = P(ΣS ⊗ ΣTj + σ2Id)PT, where P is a structured permutation matrix, and A sample Z from the joint distribution can thus be easily computed as Z = P(UΛ)X, where X is a standard multivariate normal distributed vector, and UΛ2UT is the eigen-decomposition of the covariance (ΣS ⊗ ΣTj + σ2Id). Eigen-decomposition and matrix multiplication can be efficiently computed by virtue of the properties of the Kronecker product.
In the following sections, after validating the proposed framework in a controlled setting, we provide a modelling application in the context of longitudinal modelling in AD.
4. Model Validation
Estimation of the Spatio-temporal Properties in Synthetic Data
Here, we test the ability of the proposed GP model to correctly estimate the underlying spatial and temporal properties prescribed in synthetic data. We chose a time-series of brain MRIs composed of 6 aligned longitudinal gray matter (GM) segment images of an example ADNI patient, and we applied Gaussian smoothing to obtain synthetic samples of a spatio-temporal process with predefined spatial correlation and signal to noise ratio. Moreover we generated synthetic longitudinal progressions of increasing temporal complexity following respectively voxel-wise linear, quadratic and cubic functions of time estimated through a general linear model (GLM). Furthermore, longitudinal changes in the synthetic time series were modelled with the proposed GP model. We applied a squared exponential model for the temporal covariance parameterized by the temporal length-scale lt. A maximum-a-posteriori (MAP) estimate of the parameters was obtained by using Gauss-Newton optimization scheme of the log-hyperparameters, using multivariate uninformative Gaussian hyperprior with log-hyperparameters µh = [−2, −2, 0, 3] and Σh = diag([5, 5, 1, 5]) for respectively (σ2, ls, λs, lt).
Table 1 shows the relationship between the spatio-temporal properties of the synthetic data and the MAP estimates of the GP parameters. Noticeably, the estimated spatial length-scale closely resembles the global smoothness parameter of the synthetic data, adaptively accounting for image smoothness properties. Additionally, we observed that the estimated temporal length-scale decreased when modeling longitudinal progressions of higher order models. Thus, the model also correctly denotes the increased complexity of the temporal changes.
Table 1.
Spatial smoothness (mm) | ls | σ2 | λs |
---|---|---|---|
0 | 0.09 | 9e-6 | 0.7 |
0.5 | 0.81 | 5e-6 | 0.64 |
1 | 1.2 | 3e-6 | 0.53 |
2 | 2.3 | 1e-6 | 0.5 |
3 | 3.3 | 3e-10 | 0.48 |
4 | 4.3 | 7e-11 | 0.47 |
Temporal progression | lt (log-values) |
---|---|
linear | 4.3 |
quadratic | 1.79 |
cubic | 1.72 |
Within-Subject Modelling and Prediction of Longitudinal Changes
We chose high-resolution longitudinal images of 10 AD patients, 10 patients with mild cognitive impairment (MCIc) subsequently converting to AD, and 10 healthy controls from the ADNI dataset. AD patients and healthy controls (HC) had 4 images per participant, corresponding to baseline, 6 months, 1 and 2 years scans, while for MCIc patients additional images corresponding to 3 or 4 years were available. The images were processed according to established procedures consisting of joint bias correction, tissue segmentation, alignment to the within-subject average anatomy, and non-linear normalization to a group-wise anatomical reference [12]. The final image size was of 1003 cubic voxels with isotropic resolution of 1.5 mm.
The longitudinal changes in the resulting time series of processed gray matter density maps were modelled according to the proposed GP model. The model was estimated for each subject by using 3 training images corresponding to baseline, 6 months and 1 year scans. In order to capture meaningful non-linear trends during disease progression to AD, we also applied the GP model in the MCIc group by using 4 and 5 training images, corresponding to the time range from baseline to respectively 2 and 3 years follow-up.
We applied the optimization scheme illustrated in Section 4 while imposing an informative prior on the temporal length-scale parameter with log-mean and -variance of 3 and 0.1 respectively. This choice was done in virtue of the experimental results illustrated in Table 1 in order to promote a moderately non-linear behaviour of the GP model, and at the same time avoid overfitting on the limited number of within-subject observations. The resulting computational time for the parameter estimation was of about 5 minutes per subject on a standard PC (with 2.6 GHz, QuadCore, 16GB RAM). The predictive accuracy of the model was then tested by voxel-wise comparison of the extrapolated image series with respect to the corresponding ground truth follow-up images, and compared with respect to a standard linear and quadratic voxel-by-voxel model using within-subject GLM. The group-wise average voxel-wise absolute differences between extrapolated images and real ones are shown in Figure 1. Errors were generally found to be proportional to the extrapolation time. Table 2 shows that the results of the GP model are comparable to those obtained by linear modelling when training on 3 time points only. However, the prediction of the GP model significantly improves the linear one when using more training points. This result indicates that the GP model is able to capture meaningful accelerations of the time process when sufficient data is provided, while it stays essentially linear otherwise. Figure 2 shows the mean hippocampal progression and associated confidence interval from the posterior latent process for a MCIc patient. We observe that the GP-based model of hippocampal loss is non-linear and fairly predicts the acceleration of volume loss observed in the follow-up testing images.
Table 2.
AD | HC | MCI | ||
---|---|---|---|---|
N train points | 3 | 3 | 4 | 5 |
GP | 1.9 | 1.9 | 2.9* | 2.5* |
GLM linear | 1.9 | 2 | 3.1 | 2.7 |
GLM quadratic | 6.7 | 2.6 | 8.7 | 5.4 |
5. Application: Between-Subjects Prediction of Individual Rates of Ventricle Growth using Multi-Kernel Learning
In this second application, we exploit the flexibility of our model to make covariate-based predictions of individual rates of atrophy in elderly subjects. In contrast to typical multivariate models which predict or classify scalar values, our GP framework allows prediction of images. In particular, we here focus on predicting the rate of volumetric growth in the lateral ventricle regions.
Firstly, we used computational morphometry to obtain the rates of atrophy in a large sample from the ADNI longitudinal dataset. To obtain these features for training and testing, we used 1143 and 569 MRI scans of 206 and 105 elderly subjects respectively (ages 59-91, age mean ± std: 76.0 ± 6.0 years). In order to enable predictions across a broad range of clinical states, the sample was pooled across clinical groups. It contained 111 healthy elderly and 108 subjects with stable and 92 subjects with progressive MCI. After longitudinal registration, tissue segmentation and inter-subject alignment [12], we calculated each subject’s ventricle growth rate from registered CSF images using a linear model.
Secondly, using the preprocessed images as features we considered a special case of generative model (1) to implement a prediction model based on individual subject’s covariates, e.g. age, cognitive scores, etc. This is realized by a different choice of covariance function ΣT compared to the above within-subject application. In order to enable a prediction based on multiple available covariate sets e.g. genes, clinical scores, etc. we used an additive multi-kernel learning covariance
(7) |
using a sum of (up to four) squared exponential covariances Kr with amplitudes αr, and c1, c2 denoting pairs of covariate vectors from each of (up to four) covariate sets. The symmetric matrices Mr were chosen to be either MISO = ℓ−2Id or MARD = diag(ℓ)−2. Like in typical GP regression applications, using (7) explicitly models covariance of (latent) observations f as a function of similarity of inputs c (here the covariate vectors of subjects). That implements the idea that subjects with similar covariates are expected to have similar rates of atrophy. In particular, the choice of Mr = MISO parametrizes an isotropic covariance assuming equal length-scale for different covariates of the same covariate set. An alternative choice of Mr = MARD implements automatic relevance determination (ARD) with separate length-scales estimated for each variable. We compared successively complex prediction models using (1) only global brain volumes (tgmv, twmc, tcsv) or (2) additionally using demography (age, sex, education, marital status, year of retirement), or (3) also including genetic risk in terms of the number of ApoE4 allele and (4) finally also using the clinical neuropsychological test scores MMSE, ADAS, and CDR. The models (1) to (4) step-by-step increased the amount of subject-specific information to predict maps of rates of ventricle growth. Comparison across models was performed using log marginal likelihood balancing model fit and model complexity with varying numbers of hyperparameters. We found an increasing marginal likelihood for more complex models using ARD covariance (see Table 3) and decreased model evidence for model 4 under ISO covariance. Highest marginal likelihood was observed for ARD model 4 including all predictors. This trend is also reflected in terms of mean absolute error maps demonstrating increased prediction accuracy and generalization ability during testing in an independent test sample of 105 subjects (Figure 3A). Results also showed a correlation of up to 0.52 of predicted and true growth rates (Figure 3B).
Table 3.
model | ml - ISO | ml - ARD | mae - ARD |
---|---|---|---|
1 | 1.6697 | 1.6769 | 0.0059 |
2 | 2.4309 | 2.0249 | 0.0058 |
3 | 2.4356 | 2.0513 | 0.0080 |
4 | 2.2768 | 2.4434 | 0.0057 |
6. Conclusions
We presented a novel framework for modelling and prediction of spatio-temporal processes in image time series. It is flexible and computationally efficient thanks to the proposed Kronecker structure of the covariance, and to the use of the Hoffman-Ribak method for efficient sampling from the posterior. Our model provided promising results when tested in very different experimental scenarios concerning longitudinal modelling in AD, and opens the path to the effective use of GPs for the generative modeling of neuroimaging data. The strength of the framework relies on assuming separability of spatial and temporal processes. We show that this assumption leads to meaningful results when applied to the longitudinal modeling in AD, where the expected pathological changes are generally mild and slowly varying across brain regions. This assumption might be relaxed in future work in order to also model spatially varying processes that might underlie biological progressions with different properties. It may be indeed possible to further extend the framework to allow non-stationary correlations and noise models without compromising the computational efficiency, by accounting for local smoothly varying stationary processes as previously proposed in geostatistics [13]. Finally, further extensions of the proposed work will be devoted to the group-wise non-parametric mixed-effect modeling of disease progression in clinical cohorts such as ADNI, by exploiting the flexibility of the proposed spatio-temporal covariance structure in accounting for subject and group-specific progressions and confounders.
Acknowledgements
Marco Lorenzi is grateful to Prof. John Ashburner, for his help in finalizing this work, and to Dr. Richard Turner, for his precious suggestions on the train toward London. Sebastien Ourselin receives funding from the EPSRC (EP/H046410/1, EP/J020990/1, EP/K005278), the MRC (MR/J01107X/1), the EU-FP7 project VPH-DARE@IT (FP7-ICT-2011-9-601055), the NIHR Biomedical Research Unit (Dementia) at UCL and the National Institute for Health Research University College London Hospitals Biomedical Research Centre (NIHR BRC UCLH/UCL High Impact Initiative- BW.mn.BRC10269). Gabriel Ziegler is supported in part by the German Academic Exchange Service (DAAD). The Wellcome Trust Centre for Neuroimaging is supported by core funding from the Wellcome Trust [grant number 091593/Z/10/Z].
Footnotes
For simplicity we focus on an even sampling across spatial directions, even though the generalization of the proposed model to the uneven case is straightforward.
References
- 1.Davis BC, Fletcher PT, Bullitt E, Joshi SC. Population shape regression from random design data. IJCV. 2010;90(2):255–266. [Google Scholar]
- 2.Ashburner J, Ridgway G. Symmetric diffeomorphic modeling of longitudinal structural MRI. Frontiers in Neuroscience. 2013 Feb;6(197) doi: 10.3389/fnins.2012.00197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Niethammer M, Huang Y, Vialard FX. Geodesic regression for image time-series. MICCAI. 2011:655–662. doi: 10.1007/978-3-642-23629-7_80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Lorenzi M, Ayache N, Frisoni GB, Pennec X. Mapping the effects of Aβ 1-42 levels on the longitudinal changes in healthy aging: hierarchical modeling based on stationary velocity fields. MICCAI. 2011:663–670. doi: 10.1007/978-3-642-23629-7_81. [DOI] [PubMed] [Google Scholar]
- 5.Friston KJ, Holmes A, Worsley KJ. Statistical parametric maps in functional imaging: a general linear approach. Human Brain Mapping. 1995;2:189–210. [Google Scholar]
- 6.Flandin G, Penny WD. Bayesian fMRI data analysis with sparse spatial basis function priors. NeuroImage. 2007;34(3):1108–1125. doi: 10.1016/j.neuroimage.2006.10.005. [DOI] [PubMed] [Google Scholar]
- 7.Harrison LM, Green GG. A Bayesian spatiotemporal model for very large data sets. NeuroImage. 2010;50(3):1126–1141. doi: 10.1016/j.neuroimage.2009.12.042. [DOI] [PubMed] [Google Scholar]
- 8.Rasmussen CE, Williams CKI. Gaussian Processes for Machine Learning. The MIT Press; 2005. [Google Scholar]
- 9.Ziegler G, Ridgway GR, Dahnke R, Gaser C. Individualized Gaussian process-based prediction and detection of local and global gray matter abnormalities in elderly subjects. NeuroImage. 2014 Apr;97:333–348. doi: 10.1016/j.neuroimage.2014.04.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Stegle O, Lippert C, Mooij JM, et al. Efficient inference in matrix-variate gaussian models with iid observation noise. Advances in Neural Information Processing Systems 24 [Google Scholar]
- 11.Hoffman Y, Ribak E. Constrained realizations of Gaussian fields - A simple algorithm. Astrophys J Lett. 1991 Oct;380:L5–L8. [Google Scholar]
- 12.Ashburner J, Friston K. Unified segmentation. NeuroImage. 2005;26:839–851. doi: 10.1016/j.neuroimage.2005.02.018. [DOI] [PubMed] [Google Scholar]
- 13.Gelfand A, Fuentes M, Guttorp P, Diggle P. Handbook of Spatial Statistics Chapman & Hall/CRC Handbooks of Modern Statistical Methods. Taylor & Francis; 2010. [Google Scholar]